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Abstract — There is a fundamental relationship between belief 
propagation and maximum a posteriori decoding. The case of 
transmission over the binary erasure channel was investigated in 
detail in a companion paper. This paper investigates the extension 
to general memoryless channels (paying special attention to the 
binary case). An area theorem for transmission over general 
memoryless channels is introduced and some of its many conse- 
quences are discussed. We show that this area theorem gives rise 
to an upper-bound on the maximum a posteriori threshold for 
sparse graph codes. In situations where this bound is tight, the 
extrinsic soft bit estimates delivered by the belief propagation 
decoder coincide with the correct a posteriori probabilities 
above the maximum a posteriori threshold. More generally, it 
is conjectured that the fundamental relationship between the 
maximum a posteriori and the belief propagation decoder which 
was observed for transmission over the binary erasure channel 
carries over to the general case. We finally demonstrate that in 
order for the design rate of an ensemble to approach the capacity 
under belief propagation decoding the component codes have to 
be perfectly matched, a statement which is well known for the 
special case of transmission over the binary erasure channel. 

Index Terms — belief propagation, maximum a posteriori, max- 
imum likelihood. Maxwell construction, threshold, phase transi- 
tion. Area Theorem, EXIT curve, entropy 



I. Introduction 

IT was shown in [3]-[5] that, when transmission takes place 
over the binary erasure channel (BEC) using sparse graph 
codes, there exists a surprising and fundamental relationship 
between the belief propagation (BP) and the maximum a 
posteriori (MAP) decoder This relationship emerges in the 
limit of large blocklengths. Operationally, this relationship is 
furnished for the BEC by the so-called Maxwell decoder This 
decoder bridges the gap between BP and MAP decoding by 
augmenting the BP decoder with an additional "guessing" 
device. Analytically, the relationship between BP and MAP 
decoding is given in terms of the so-called extended BP EXIT 
(EBP EXIT) function. Fig. ^ shows this curve (double "S"- 
shaped curve) for transmission over the BEC and the ensemble 
LDPC( 3a:+3a ^+4a: — ^^6-^ (-jjjg (jegj-gg distributions are from an 
edge perspective). The BP EXIT curve is the "envelope" of the 
EBP EXIT curve (let a ball run slowly down the slope). The 
MAP EXIT curve on the other hand is conjecture to be derived 
in general from the EBP EXIT curve by the so-called Maxwell 
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construction. This Maxwell construction consists of converting 
the EBP EXIT curve into a single-valued function by "cutting" 
the EBP EXIT curve at the two "S"-shaped spots in such a 
way that there is a local balance of the cut areas. A detailed 




Fig. 1. The EBP EXIT curve (double "S"-shaped curve), the corresponding 
BP EXIT curve (dashed and solid line; the "envelope" of the EBP EXIT curve) 
and the iVIAP EXIT curve (thick solid line; constructed by "cutting" EBP 
EXIT at the two "S"-shaped spots in such a way that there is a local balance 
of the areas shown in gray) for the ensemble LDPC( ^^i^^rri^^ — ^x^). 



discussion of this relationship in the case of transmission 
over the BEC can be found in [5]. Let us summarize. For 
transmission over the BEC using sparse graph codes from long 
ensembles, BP decoding is asymptotically characterized by its 
BP EXIT curve and MAP decoding is characterized by its 
MAP EXIT curve. These two curves are linked via the EBP 
EXIT curve. 

A. Overview of Results 

The pleasing picture shown in Fig. seems to have a fairly 
complete analog in the general setting. Unfortunately we are 
not able to prove this claim in any generality. But we show 
how several of the key ingredients can be suitably extended 
to the general case and we will be able to prove some of their 
fundamental properties. 

Namely, we introduce a general area theorem (GAT). This 
area theorem, when applied to the BEC, leads back to the 
notion of EXIT functions as shown in the companion paper [5]. 
For the general case however, it is necessary to use a distinct 
function (but similar in many respects to EXIT). We call it 
the generalized EXIT (GEXIT) function. We then show that 
GEXIT functions share some of the key properties with EXIT 
functions. In particular, we are able to extend the upper-bound 
on the MAP threshold presented in [3] (or, more generally, the 
lower bound on the conditional entropy) to general channels. 

In [6], [7] Guo, Shamai and Verdti showed that for Gaussian 
channels the derivative (with respect of the signal-to-noise 
ratio) of the mutual information, is equal to the mean square 
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error (MSE), and in [6] they showed that a similar relationship 
holds for Poisson channels. One can think of GEXIT functions 
as providing such a relationship in a more general setting 
(where the generalization is with respect to the admissible 
channel families). For some channel families, GEXIT func- 
tions have particularly nice interpretations. E.g., for Gaussian 
channels, we not only have the interpretation of the derivative 
in terms of the MSE detector, but this interpretation can be 
simplified even further in the binary case: the derivative of 
the mutual information can be seen as the "magnetization" 
of the system as was shown by Macris in [8]. The results 
in [9], which have appeared since the introduction of GEXIT 
functions in [1], can be reformulated to give an interpretation 
of GEXIT functions for the class of additive channels (see 
also [10]). It is likely that interpretations for other classes of 
channels will be found in the future. 



B. Paper Outline 

In Section m] we review the necessary background material 
and in particular recall the GAT first stated in [1]. Starting 
from this GAT, we introduce in Section Hill GEXIT functions. 
We will see that for transmission over the BEC, GEXIT 
functions coincide with standard EXIT functions, but that 
this is no longer true for general channels. In Section |V] 
we then concentrate on LDPC ensembles. In particular we 
define the quantities which appear in the asymptotic setting. 
In Section HVl we then prove one of the fundamental properties 
of GEXIT functions, namely that GEXIT kernels preserve the 
ordering implied by physical degradation. This fact is then 
exploited in Section IVII where we show how to compute 
an upper bound on the threshold under MAP decoding (or, 
more generally, a lower bound on the conditional entropy) by 
considering the BP GEXIT function, which results from the 
regular GEXIT function if we substitute the MAP density by 
its equivalent BP density. In Section IVIII we define extended 
BP GEXIT (EBP GEXIT) functions which include the unsta- 
ble branches, present several examples of these function and 
discuss how they provide a bridge between belief propagation 
and maximum a posteriori decoding. Several properties of 
EPB GEXIT functions are discussed in Section fVIIII together 
with a numerical procedure for constructing them. We show 
that they satisfy an area theorem as well. Section [Tx] presents 
some partial results on the smoothness and uniqueness of EBP 
GEXIT functions. In Section we show the surprising fact 
that, in case the previously computed upper bound on the 
MAP threshold is tight, then the a posteriori probabilities on 
the bits are equal to the corresponding BP estimates. Section 
LKII contains a proof that iterative coding systems cannot 
achieve reliable communication above capacity, using only 
density evolution and the area theorem (and not the standard 
Fano inequality). A matching condition for component codes 
of capacity achieving sequences follows. In the appendices 
we collect some technical derivations and a discussion of 
several equivalent forms of the GEXIT functions for Gaussian 
channels. We finally conclude with some remarks in Section 

ixm 



II. Review and Notations 

Let X denote the channel input alphabet (which we always 
assume finite) and y the channel output alphabet (typically, 
y = M). All channels considered in this paper are memoryless 
(M). Rather than looking at a single memoryless channel, 
we usually consider /am/Z/ei of memoryless channels param- 
eterized by a real-valued parameter e, which we denote by 
{M(e)}£. Each channel from such a family is characterized by 
its transition probability density Py \x{y\^) (where x € X 
and y E y). We adopt here the convention of formally 
denoting channels by their transition density even when such 
a density does not exist, and write / f{y)PY \x{y\ x)dy as a 
proxy for the corresponding expectation. 

Transmission over binary-input memoryless output- 
symmetric^ (BMS) channel plays a particularly important 
role. In this case, it will be convenient to assume that the 
input bit Xi takes values Xi E X = {+1,-1}. The channel 
indexed by parameter e is generically denoted by BMS(e). 

In the sequel we will often assume that the channel family 
{BMS(e)}e is ordered by physical degradation (see [II] for a 
discussion of this concept). It is well known that the standard 
families {BEC(e)}^^Q (binary erasure channels with erasure 

parameter e), {BSC(e)}^^^g (binary symmetric channels with 
cross-over probability e), and {BAWGNC(o')}^q (binary- 
input additive white Gaussian noise channels Y ^ X + N 
where X takes values in X and the noise N has standard 
deviation a and zero-mean) all have this property. For nota- 
tional simplicity we will use a shorthand and say that a channel 
family is degraded. 

In the binary case, an important role is played by the 
distribution of the log-likelihood ratio L = loa; 
assuming X ^ 1. We denote the corresponding density 
by c{l) and call it an i-density. In fact, without loss of 
generality we can assume that the log-likelihood ratio (L) 
mapping, y i— > log py|^(j^j^i) ^ is already included in the 
channel description. This is justified since the random variable 
L constitutes a sufficient statistic. This inclusion of the L- 
processing is equivalent to assuming that PY\x{y\ + 1) = 
Further facts regarding BMS channels can be found in [11]. 
As far as LDPC and iterative coding systems are concerned, 
we will keep the formalism introduced in the companion paper 
[5] and which is found, e.g., in [12]-[15]. 

In the case of a non-binary input alphabet X, 
the log-likelihood mapping will be replaced by the 
'canonical' representation of the channel output 
y ^ v{y) = {pY\x{y\x) / z{y) : x e X}, where 

fzxPY\x{y\x)- Notice that iy{y) belongs to 
the d^"! — 1) -dimensional simplex S^;^^_i. In the binary case, 
the log-likelihood ratio is just a particular parametrization of 
the one-dimensional simplex. 

In what follows we will often be concerned with how certain 
quantities (e.g., the conditional entropy H{X \ Y)) behave as 
we change the channel parameter In order to ensure that 

' A binary memoryless channels is said to be symmetric (or, more precisely, 
output-symmetric) when the transition probability verifies PY\x{y\ + 1) = 

PY\xiy\ - !)■ 
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the involved objects exits we need to impose some regularity 
conditions on the channel family with respect to the channel 
parameter. This can be done in various ways, but to be concrete 
we wiU impose the following restriction. 

Definition 1 (Channel Smoothness): Consider a family of 
memoryless channels with input and output alphabets X and 
y, respectively, and characterized by their transition proba- 
bility PY\x{y\x) (with y taking the canonical form described 
above). Assume that the family is parameterized by e, where 
e takes values in some interval / CM. The channel family is 
said to be smooth with respect to the parameter e if for all 
X £ X and all bounded continuously differentiable functions 
f{y) on S'lA'i-i, the integral / f{y)pY\x{y\x)Av exists and is 
a continuously differentiable function with respect to e, e e /. 
In the sequel we often say as a shorthand that a channel 
BMS(e) is smooth to mean that we are transmitting over the 
channel BMS(e) and that the channel family {BMS(e)}e is 
smooth at the point e. If BMS(e) is smooth, the derivative 
3J S f{y)PY\x{y\x)^y exists and is a linear functional of /. 
It is therefore consistent to formally define the derivative of 
PY\x{y\x) with respect to e by setting 

For a large class of channel families it is straightforward to 
check that they are smooth. This is e.g. the case if {y} is finite 
and the transition probabilities are differentiable functions of e, 
or if it admits a density with respect to the Lebesgue measure, 
and the density is differentiable for each y. In these cases, the 
formal derivative coincides with the ordinary derivative. 

Example 1 (Smooth Channels): It is straightforward to 
check that the families {BEC(e)}^^o, {BSC(e)}J^o, and 
{BAWGNC(cr)}^^o are all smooth. 

In the case of transmission over a BMS channel it is useful 
to parameterize the channels in such a way that the parameter 
reflects the channel entropy. More precisely, we denote by 
h the conditional entropy H{X\Y) when the channel input 
X is chosen uniformly at random from {+1,-1}, and the 
corresponding output is Y . Consider a family of BMS channels 
characterized by their i-densities. We then write this family of 
i-densities as {ch}h if i/(ch) = h, where the entropy operator 
is defined as (see, e.g., [11]) 

/OO /"OO 
c(2/)log2(l + e-'^)dy= / c{y)l{y)dy. (2) 
-OO J —OO 

This integral always exists as can be seen by writ- 
ing it in the equivalent form as Rieman-Stieltjes integral 
/o°° h2 ^ i^J-y ^ d|C|(y). In the above definition we have 

introduced the kernel l{y) = log2(l + e^^). For reasons that 
will become clearer in Lemma[2 we call l{y) the EXIT kernel. 

The channel family is said to be complete if h ranges from 
to 1. For the binary erasure channel the natural parameter e 
(the erasure probability) already represents an entropy. Never- 
theless, to be consistent we will write in the future BEC(h). 
By some abuse of notation, we write BSC(h) to denote the 
BSC with cross-over probability equal to e(h) = /i^^(h), 
where h2{x) ^ — .Tlog2(a;) — (1 — x) log2(l — x), the binary 



entropy function. In the same manner, BAWGNC(h) denotes 
the BAWGNC with a standard deviation of the noise such that 
the channel entropy is equal to h. 

We will encounter cases where it is useful to allow each bit 
of a codeword to be transmitted through a different (family of) 
BMS channel(s). By some abuse of notation, we will denote 
the i* channel family by {BMS(hi)}ii. . A situation in which 
this more general view appears naturally is when we consider 
punctured ensembles. We can describe this case by assuming 
that some bits are passed through an erasure channel with 
erasure probability equal to one, whereas the remaining bits 
are passed through some other BMS channel. In such cases it is 
convenient to assume that all individual families {BMS(hi)}hi 
are parameterized in a smooth (differentiable) way by a single 
real parameter, call it e, i.e., = h..t(e). In this way, by 
changing e all channels change according to hi(e) and they 
describe a path through "channel space". 

The general area theorem (GAT), first introduced in [1], 
plays center stage in the remainder of this paper 

Theorem 1 (General Area Theorem): Let X be chosen 
with probability px [x] from A"". Let the channel from X loY 
be memoryless, where Yi is the result of passing Xi through 
the smooth family {M(ei)}c., ti £ li. Let J7 be a further 
observation of X so that Pa | (w | x, y) = pn\x{oj\x). 
Then 

l—l 

Proof: For i £ [n], the entropy rule gives H{X \ Y^Q,) = 
H{X, \Y,VL) + H{X^, I X,, Y, n). We claim that 

pix^^ I x.„Y, n) = p{x^, I x^, Y^„ n), (4) 

which is true since the channel is memoryless and 
Pn I x,Y I X, ?/) = I X (t^ I x). Furthermore H{X., \ Y, il) is 
differentiable with respect to as a consequence of the chan- 
nel smoothness (it is straightforward to write the conditional 
entropy as expectation of a differentiable kernel, cf. Lemma 
12 and remarks below). Therefore, H{X^i\ Xi,Y,n) = 
H{X^,\X,,Y^,,n) and 'J^^ = OJH^^, p,„^ 
this the total derivate as stated in Q follows immediately. 



III. GEXIT Functions 

Let X be chosen with probability px{x) from X". Assume 
that the component of X is transmitted over a memoryless 
erasure channel (not necessarily binary) with erasure probabil- 
ity a, denote it by EC(e,). Then H{X, \ Y) = l^H{X, \ Yi = 
X^,Y^^) + e^H{X,\Y = e^H{X,\Y^,). Apply 

equation (|3} in Theorem assuming that = e, i G [71] . 
To remind ourselves that F is a function of the parameter e 
we write Y{e). Then 

1 H 1 " 

--HiX\Y{e)) = - Y^H{X.\Y^.ie)). 

i=l 

The function h^{e) ^ H{Xi \ y^i(e)) is known in the literature 
as the EXIT function associated to the bit of the given code 
and h{e) ^ ^ ELi ^(^» I is the (average) EXIT 
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function.^ We conclude that for transmission over EC{e), 
h{e) = ^j^H{X I Y{e)). If we integrate this relationship with 
respect to e from to 1 and note that H{X \ 1^(0)) = and 
H{X I Y{1)) = H{X), then we get the basic form of the area 
theorem for the EC(e): /J^ h{e)Ae = H{X)/n. This statement 
was first proved, in the binary case, by Ashikhmin, Kramer, 
and ten Brink in [16] using a different framework. 

Example 2 (Area Theorem for Repetition Code and BEC): 
Consider the binary repetition code with parameters [n, 1, n], 
where the first component describes the blocklength, the 
second component denotes the dimension of the code, and 
the final component gives the minimum (Hamming) distance. 
By symmetry /ij(h) = h{h) = h"^^ for all i G [n]. We have 

/i(h)dh = ^ = H{X)/n, as predicted. 
The above scenario can easily be generalized by allowing the 
various components of the code to be transmitted over different 
erasure channels. Consider, e.g., a binary repetition code of 
length n in which the first component is transmitted through 
BEC(5), where 5 is constant, but the remaining components 
are passed through BEC(h). In this case we have jj^ /i(h)dh = 
{H{X I Y{5, 1, • • • , 1)) - H{X I Y{5, 0, • • • , Q)))/n = S/n 
(assuming that X is chosen uniformly at random from the set 
of codewords). We will get back to this point shortly when 
we introduce GEXIT functions in Definition |3] 

The concept of EXIT functions extends to general channels 
in the natural way. To simplify notation somewhat let us focus 
on the binary case. 

Definition 2 (h for BMS Channels): Let X he a binary 
vector of length n chosen with probability px (x). Assume that 
transmission takes place over the family {BMS(h)}ii. Then 



h,ih)^H{X,\Y^,{h)) 



1 



1 



n 



i=l i=l 

This is the definition of the EXIT function introduced by ten 
Brink [17]-[21] (see footnote^. 

We get a more explicit representation if we consider trans- 
mission using binary linear codes. In this context recall that 
a binary linear code is proper if it possess a generator matrix 
with no zero columns. As a consequence, in a proper binary 
linear code half the codewords take on the value +1 and half 
the value —1 in each given position. 

Lemma 1 (h for Linear Codes and BMS Channels): Let 
X be chosen uniformly at random from a proper binary 
linear code and assume that transmission takes place over the 
family {BMS(h)}h. Define 



, , ^ A , fPX,\Y^A+'^\y~^) 



(5) 



and $j = 0j(K^i). Let a.; denote the density of ^i, as- 
suming that the all-one codeword was transmitted, and let 
a = 7^T.'Ll^^■ Then 



/i(h) = H{a), 



^ More precisely, EXIT functions are usually defined as I{Xi | y^i(e)) = 
H{Xi) — H{Xi I Y^i(e)), which differs from our definition only in a trivial 
way. 



where H{-) is the entropy operator introduced in 

Proof Note that X,; ^ — > forms a Markov 
chain. ^ Equivalently, we claim that $i is a sufficient statistic 
for Xi. From this we conclude that (see [22, Section 2.8]) 

H{X, I K.,) = H{X, I 

Now note that since we assume that X was chosen uniformly 
at random from a proper binary linear codes, it follows that 
the prior for each Xi is the uniform one. Therefore, is in 
fact a log-likelihood ratio. It is shown in [11, Lemma 3.37] 
that, assuming that X is chosen uniformly at random from a 
proper binary linear code, the binary "channel" p{(t)i \ Xi) is 
symmetric. Further, note that the density of $i conditioned on 
= 1 is equal to the density of conditioned that the all- 
one codeword was transmitted."* By assumption this L-density 
is equal to a^. We conclude that H{Xi \ $i) = H{ai). ■ 

As the next example shows, the EXIT function does not 
fulfill the area theorem in the general case. 

Example 3 (EXIT Function for General BMS Channels): 
Fig. 12 shows the EXIT function for the [3,1,3] repetition 
code as well as for the [6, 5, 2] single parity-check code for 
BEC(h), BSC(h), and BAWGNC(h). E.g., the EXIT function 
for the [n, n — 1, 2] single parity-check code over BSC(h) is 
given by 



/i,(h) = h(h) = h-. 



1 - (1 - 2e(h))' 



where e(h) = h'^'^ija). Note that these EXIT functions are 
"ordered." More precisely, for a repetition code we get the 
highest extrinsic entropy at the output for the channel family 
{BSC(h)}h and we get the lowest such entropy if we use 
instead the family {BEC(h)}h. Indeed, one can show that 
these two families are the least and most "informative" family 
of channels over the whole class of BMS channels for a 
repetition code, [23]-[25]. The roles are exactly exchanged 
at a check node. Since we know that the EXIT function 
for the BEC fulfills the area theorem, it follows from this 
extremality properties that the EXIT functions for the BSC 
and the BAWGNC do not fulfill the area theorem. Indeed, for 
a single parity-check code with ?i = 3 and the BSC(h) the 
area under the EXIT function is given by 

dh« 0.643704 < 2/3. 

Although the above fact might be disappointing it is not 
surprising. As it should be clear from the discussion at the 

^For 2 G M, let y^i be an element of (ij>i) ^{z) so that z = </>i(y^i). 

Then px^\Y^^i,<s>,{^i\y^i,z) = ^ 2(i+e^) = PXi | *, I z). 

From this we conclude that Py^^ \ Jf;,*; (^i I ^-^i^ z) = Py^^ | (3/~i I z). 

*To see this, note that, using the symmetry of the channel and 
the equal prior on the codewords, we can write px^ \ y i^i I J/) = 
c{y)'}2s£C-s —x ■ Py \ x{y^ \ X)' where c{y) is a constant indepen- 
dent of Xi, C denotes the code, and 1 denotes the all-one code- 
word. In the same manner, if x' g C, then Px, | y (^s 1 2/^') 

. ■ Compare the density of the log- 

likelihood ratio assuming that the all-one codeword was transmitted to the 
one assuming that the codeword x' was transmitted. The claim follows by 
noting that for any y £ y, Py \ x{y\l) = Py | x (s/^' \^')^ ™d that in this 
case also py | x iv^ 1 1) = Py | x [yx'x \x'). 
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1 
Fig. 2. The EXIT function of the [3, 1, 3] repetition code and the [6, 5, 2] 
parity-check code for the BEC(h) (solid curve), BSC(h) (dashed curve) and 
BAWGNC(h) (dotted curve). 



beginning of this section, the EXIT function is related to the 
GAT only in the case of the erasure channel. Let us therefore 
go back to the GAT and define the function which fulfills the 
area theorem in the general case. 

Definition 3 (GEXIT Function): Let X be a vector of 
length n chosen with probability px{x) from A"". Let the 
channel from X to F be memoryless, where Yi is the result of 
passing Xi through the smooth family {M(ei)}e;, ti G [0, 1]. 
Assume that all individual channels are parameterized in a 
smooth (differentiable) way by a common parameter e, i.e., 
ei — ei{e), i £ [n]. Let H be a further observation of X 
so that pq I x.vi^^ \x,y) = Pn\x{^\ x). Then the and the 
(average) generalized EXIT (GEXIT) function are defined by 

A dH{X,\Y,n)de, 



de 



n 

1=1 

Discussion: The definition is stated in quite general terms. 
First note that if we consider the integral g{e)de, then from 
Theorem[2we conclude that the result is ^ {H{X \ Y{t), $7) — 
H{X \Y{e),il)y In words, if we smoothly change the in- 
dividual channel parameters as a function of e, then the 
integral of (?i(e) tells us how much the conditional entropy of 
the system changes due to the total change of the parameters 
e^. To be concrete, assume, e.g., that all bits are sent through 
Gaussian channels. We can imagine that we first only change 
the parameter of the Gaussian channel through which bit 1 is 
sent from its initial to its final value, then the parameter of the 
second channel and so on. Alternatively, we can imagine that 
all channel parameters are changed simultaneously. In the two 
cases the integrals of the individual GEXIT functions differ 
but their sum is the same and it equals the total change of the 
conditional entropy due to the change of channel parameters. 
Therefore, GEXIT functions can be considered to be a "local" 
way of measuring the change of the conditional entropy of 
a system. One should think of the common parameter e as a 
convenient way of parameterizing the path through "channel 
space" that we are taking. 

In many applications all channels are identical, and formulas 
simplify significantly. In Section IVIII we will see a case 
in which the extra degree of freedom afforded by allowing 
different channels is important. The additional observation 
is useful if we consider the design or iterative systems and 
component-wise GEXIT functions. For what follows though 



we will not need it. Hence, we will drop in the sequel. 

If we assume that the input is binary we obtain a more 
explicit expression for the GEXIT functions. 

Lemma 2 (g for BM Channels): Let X be a binary vector 
of length n chosen with probability px{x). Let the channel 
from X to y be memoryless, where Yi is the result of passing 
Xi over the smooth family {BM(hi)}ii;, hi S [0, 1]. Assume 
that all individual channel families are parameterized in a 
smooth (differentiable) way by a common parameter e, i.e., 
hi = hi(e), I e [n]. Then the and the (average) generalized 
EXIT (GEXIT) function are given by 

9ii<^) = [ ki)4-p(2/i|2;i)- (6) 

J<i,.v, ^ dh,; 



log 



^ p{xi\<l)i)p{yi\xi 



dh 
de 



5(e) 



1 " 



(7) 



where (j>i{yr^i) and are defined as in Q- 
Discussion: As mentioned above, the derivative of p{yi\xi) in 
Eq. (|6j has to be interpreted in general as in Eq. Q. Moreover, 
writing the same expression as ^^(e) = / f{y)-^^p{y.i\xi)dy, 
the existence of such derivative follows from the channel 
smootheness and the differentiability of f{y) (if written as 
a function of the log-likelihood log ^^^j^jj- 

Proof: We proceed as in the proof of Lemma^ We claim 
that Xi {^i,Yi) Y forms a Markov chain (equivalently, 
(<I>i, Yi) constitutes a sufficient statistic). To see this, fix z e E 
and let y^i be an element of (0^) (z), so that z ~ (piiy^i)- 
Then, using the fact that Yi is conditionally independent of 
Y^i, given Xi = Xi, we may write 

PX,\Y,,Y~^,'S^ii^^ IViiV-i^z) = 

PY,\Xi{yi \xt)px,\Y^i,<s>Axt \y~t,z) 
T,x[ex Py, I X, {yz\x'i)px, \ y^,,*, (x', \ y^,, z) 

Since Xi ^ $i ^ Yr^i forms a Markov chain (as 
already shown in the proof of Lemma 0, we have 

PXi\Y^,,<fi{xi\yr^i,z) = pxi\<i>i{xi\z). Substituting in 
the above equation, we get | Y^.y^i,*; (2;^ | J/i, 2) = 
PXi I Yi,*, i^i I yi, z), as claimed. 
Therefore, we can rewrite gi{e) as 



dHiX,\Y) dh, dH{X,\<P,,Yi)dh, 



dhi de 
Expand H{X,\<^^,Y,) as 



de 



/ ^p{xi,(l)z,yi)log2ip{xi \(l}i,yi))dyid(j)i 

J4>i,yi 



l0g2 



p{xi)p{(j)i I Xi)p{yi\xi)- 
p{xi\(t)i)p{yt\xi) 

Hx't^X P{x'^\<t>i)P{y^Wi) 



d2/id(/>i 



This form has the advantage that the dependence of 
H{Xi\^i^Yi) upon the channel at position i is completely 
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explicit. Let us therefore differentiate the above expression 
with respect to h^, the parameter which governs the transition 
probabiHty p{yi \ Xi). The terms obtained by differentiating 
with respect to the channel inside the logj vanish. For in- 
stance, when differentiating with respect to the p{yi\xi) at the 
numerator, we get 



Vp(a;,)p((/), I x,)—p{y, \ Xi)dy., 
^ dh,; 



dhj 



p{y^ \ xi)Ay^A4>i = 0. 



When differentiating with respect to the outer p(^yi\xi) we get 
the stated result. ■ 
Although the last lemma was stated for the case of binary 
channels, it poses no difficulty to generalize it. It is in fact 
sufficient to replace (f>i{yr^i) with any sufficient statistic of 
Xi, given Yr^i ~ y^i. For instance, one may take (f>i{yr^i) = 
{PXiiY^ii^i 1 S X}, which takes value on the {\X\ — 
1) -dimensional simplex, or any parameterization of it. The log- 
likelihood can be regarded as a particular parameterization of 
the 1-dimensional simplex. More generally, pXi\Y^i{xi 1 
is a natural quantity appearing in iterative decoding. The proof 
(as well as the statement) applies verbatimly to this case. 

We get an even more compact description if we assume that 
transmission takes place using a binary proper linear code and 
that the channel is symmetric. 

Lemma 3 (g for Linear Codes and BMS Channels): Let 
X be chosen uniformly at random from a proper binary 
linear code of length n. Let the channel from X to F be 
memoryless, where Yi is the result of passing Xi over the 
smooth family {BMS(hi)}hj. Assume that all individual 
channels are parameterized in a smooth (differentiable) way 
by a common parameter e, i.e., = hi(e), is [n]. Let the 
i* channel be characterized by its L-density, which by some 
abuse of notation we denote by Cb„s(ii.). Let (pi and $i be as 
defined in Q and let denote the density of assuming 
that the all-one codeword was transmitted. Then 



aj(z)/^™='("'>(z) dz, 



where 



^CBMs(hi)(w) 

de 



l0g2(l 



Discussion: The remarks made after 



Aw. 
Lemma |2 



apply in particular to the present case: We write 
joo^ £cbm^2(^ ^gg^^2 -|- e^"~™) dw as a proxy for 

i { S-oo CBMs(h,)(w') log2(l + e-"-"") dw}. The latter 
expression exists, since log2(l + e^^^"') is continuously 
differentiable as a function of w and by assumption the 
channel family is smooth. Note further that /'^™^<''i' (z) is 
continuous and non-negative so that .gj(e) exists as well. 

Proof: Consider the expression for ,9j(e) as given in Q. 
By assumption, p{yi \ Xi) is symmetric for all i E [n]. Further, 
as already remarked in the proof of Lemma [2 the "channel" 
p{4'i is symmetric as well. It follows from this and the 
fact that pxi (+1) = PXi (^1) (due to the assumption that the 
code is proper and that codewords are chosen with uniform 



probability) that the contributions to gj(e) for x,; = +1 and 
Xi = — 1 are identical. We can therefore assume without loss 
of generality that Xi = +1. Recall that the density of $j 
assuming that Xi ~ 1 is equal to the density of $i assuming 
that the all-one codeword was transmitted. The latter is by 
definition equal to a^. As remarked earlier, a^ is symmetric. 
Further, as discussed in the introduction, we can assume that 
the i* BMS channel outputs already log-likelihood ratios. 
Therefore, py^iXiiVil + 1) = <^BMs{h^){yi)- Finally, consider the 
expression within the log2. If a;^ = +1 then the numerator and 
denominator are equal and we get one. If on the other hand 
x'i = —I then we get by the previous remarks the product of 
the likelihoods. Putting this all together we get 



.9j(e) 



, dCeMs(h,)(^) 1 /i , - 

Mz) log2 (1 + e 



dzdw. 



The thesis follows by rearranging terms. ■ 
Example 4 (Alternative Kernel Representations): Note that 
because of the symmetry property of L-densities we can write 



5(e) 



a(2)/=™s(h)(2) dz 

/CBMS(h) (^2:) + 



|a|W- 



1 



dz. 



This means that the kernel is uniquely specified on the absolute 
value domain [0,cx)], but that for each z S [0, cx)] we can 
split the weight of the kernel in any desired way between +z 
and -z so that /^BMS(h) (^^^ g-^i^swsc^) (_^) equals the desired 
value. In the sequel we will use this degree of freedom to 
bring some kernels into a more convenient form. Although it 
constitutes some abuse of notation we will in the sequel make 
no notational distinction between equivalent such kernels even 
though pointwise they might not represent the same function. 

As we have akeady remarked in the discussion right after 
Definition |3] the GEXIT functions .gj(e) allow us to "locally" 
measure the change of the conditional entropy of a system. 
This property is particularly apparent in the representation of 
Lemma 13 where we see that the local measurement has two 
components: (i) the kernel which depends on the derivative of 
the channel seen at the given position and (ii) the distribution 
a;, which encapsulates all our ignorance about the code 
behavior with respect to the i* position. This representation 
is very intuitive. If we improve the observation of a particular 
bit (derivative of the channel with respect to the parameter) 
then the amount by which the conditional entropy of the 
overall system changes clearly depends on how well this 
particular bit was already known via the code constraints and 
the observations of the other bits (extrinsic posterior density): 
if the bit was already perfectly known then the additional 
observation afforded will be useless, whereas if nothing was 
known about the bit one would expect that the additional 
reduction in entropy of this bit fully translates to a reduction 
of the entropy of the overall system. We will see some 
quantitative statements of this nature in Section HVl 

In the next three examples we compute the kernels 
^CBMs(h,)(2) foj. the standard families {BEC(h)}h, {BSC(h)}h, 
and {BAWGNC(h)}h. If we consider a single family of BMS 
channels parameterized by the entropy h it is convenient to 
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"normalize" the GEXIT kernel so that it measure the "progress 
per dh". This means, in the following examples we compute 



;CBMS(h)(-2) 



TOO aCBMS(hj(^') , /-, 

A J-oo dl iOg2U 



Aw 



OC 9cBMS(hi)(™) 



l0g2(l 



dw 



(8) 



iog((il<r)/<r) log (x^iff+f ) • The limiting values are seen to 
be limh^i (s) = 1 - s^^ and limh^o Irfl^^scw (s) = l. 

Example 10 (GEXIT Kernel, \D\-Domain - {BAWGN{\i.)}i,): 
Using Example and (|9jl, it is straightforward to write the 
kernel in the 1 13 1 -domain as 



Example 5 (GEXIT Kernel, L-Domain - {B£'C(h)}iij.' 
If we take the family {cBEc(h)}h, where h = e denotes 
both, the channel (intrinsic) entropy and the cross-over 
erasure probability, then a quick calculation shows that 
;cBEc(h)(;2) = log2(l + e-"-) = l{z). In words, the GEXIT 
kernel with respect to the family {BEC(h)}h is the regular 
EXIT kernel. 

Example 6 (GEXIT Kernel, L-Domain - {B5'C(h)}iij." Let 
us now look at the family {cBsc(h)}h- Some calculus shows 
that 



|f^|<:BAWGNC(h(e)) ('g') 



E 



+00 (l^s^)e~ — 5^ 



■dw 



'£{-1.+!} /+°° dw 



/CBSC(h) (2) ^ log 



1 



/log 



1 - e 



where e = /ij ^(h.). For a fixed z e K and h — > 0, the kernel 
converges to 1 as l + z/log(e), whereas the limit when h 1 
is equal to j-^jr- 

Example 7 (GEXIT Kernel, L-Domain - {BAWGNC{h)}\f.nt)- 
Consider now the family {cBAWGNc(h)}h, where h denotes the 
channel entropy. This family is defined in Example ^ 
Recall that the noise is assumed to be Gaussian with zero- 
mean and variance a^. A convenient parameterization for 
this case is e = This means that in the following 

h = i?(cBAWGNc(CT2=2/e))- After some steps of calculus shown 
in Appendix m and Lemma [Tsl we get 



As shown in Appendix ||] the limiting values are the same 
as for the BSC, i.e., limh^i Irfj^BAwoNcct) (g) = 1 _ 
limh^o |(i|^BAw™c(h)(g) ^ I 

In Fig. 13 we compare the EXIT kernel (which is also 
the GEXIT kernel for the EEC) with the GEXIT kernels 
for BSC(h) and BAWGNC(h) in the -domain for several 
channel parameters. Note that these kernels are distinct but 
quite similar. In particular, for h = 0.5 the GEXIT kernel 
with respect to BAWGNC(h) is hardly distinguishable from 
the regular EXIT kernel. The GEXIT kernel for the BSC shows 
more variation. 
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\d\''Bsc{h) (s) (dotted line) and |d|=^BAWGNc(h) (5) (solid line) at channel entropy 
rate h = 0.1 (left), h = 0.5 (middle) and h = 0.9 (right). 



l+e" 



In Appendix |I] we give alternative representations and/or 
interpretations of this kernel. In particular we discuss the 
relationship to the formulation presented by Guo, Shamai and 
Verdu in [7], [26] using a connection to the MSE detector 
as well as the formulation by Macris in [8] based on the 
Nishimori identity. 

One convenient feature of standard EXIT functions is that they 
are fairly similar for a given code across the whole range of 
BMS channels. Is this still true for GEXIT functions? GEXIT 
functions depend on the channel both through the kernel 
as well as through the extrinsic densities. Let us therefore 
compare the shape of the various kernels. It is most convenient 
to compare the kernels not in the L-domain but in the \D\- 
domain. A change of variables shows that in general the L- 
domain kernel, call it and the associated |D|-domain 

kernel, denote it by are linked by 



Example 11 (Repetition Code): Consider the [71,1,71] repe- 
tition code. Let {Chjh characterize a smooth family of BMS 
channels. For n e N, let cjj" denote the ri-fold convolution 
of Ch. The GEXIT function for the [?i, l,n] repetition code 
is then given by g{h) = i^i/(cj"). Explicitly, we get 
ggg(,(h) = h" = h^^^{h). As a further example, g^^^ is given in 
parametric form by 



/i2(e). 



71 log (e/e) 

with £ = ! — £. 

Example 12 (Single Parity-Check Code): Consider the 
dual code, i.e., the [71,71 — 1,2] parity-check code. Some 
calculations show that g^^^ is given in parametric form by 



(/12(e), 1- (1-26) 



1 /l + (l-2£ 



log(^) 



^ + -!r(logi±i). (9) 



2 ^ ^ l + s' 2 1 
E.g., if we apply the above transformation to the previous 
examples we get the following results. 

Example 8 (GEXIT Kernel, \D\-Domain - {BEC{h.)}i^): 
We get |d|=''EC(h) (s) = h2{{l + s)/2). 

Example 9 (GEXIT Kernel, \D\-Domain - {B5'C(h)}hj." 
Some calculus shows that Idl^Bscw.)) (s) = 1 + 



No simple analytic expressions are known for the case of 
transmission over the BAWGNC. 

Fig. 13 compares EXIT to GEXIT curves for some repetition 
and some single parity-check codes. 

Example 13 (Hamming Code): Consider the [7,4,3] Ham- 
ming code. When transmission takes place over BEC(h), it is 
a tedious but conceptually simple exercise to show that the 
EXIT function is /ijh) = Sh^ + Ah^ - 15h"* + 12h''^ - 3h^ 
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Fig. 4. The EXIT (dashed) and GEXIT (dotted) function of the [n, 1, n] 
repetition code and the [n,n — 1,2] parity-check code assuming that trans- 
mission takes place over BSC(h) (left picture) or the BAWGNC(h) (right 
picture), n e {2, 3, 4, 5, 6}. 



see, e.g., [3], [16]. In a similar way, using the derivative of 
the conditional entropy, one can give an analytic expression 
for the GEXIT function assuming transmission takes place 
over the BSC. Both expressions are evaluated in Fig. |5] (left). 
A comparison between GEXIT and EXIT functions for the 
Hamming code and the BSC is shown in Fig. |5] (right). 

Example 14 (Simplex Code): Consider now the dual of the 
Hamming code, i.e., the [7,3,4] Simplex code. For transmis- 
sion over the BEC we have /i(h) = - 6h'5 + 3h^. Fig. |5l 
compares GEXIT and EXIT functions for this code when 
transmission takes place over the BEC and over the BSC. 
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Fig. 5. Comparison of the GEXIT functions for the [7,4,3] Hamming 
code and its dual. Left picture: Comparison between GEXIT functions when 
transmitting over the BEC (dashed line) and over the BSC (solid line). Right 
picture: Comparison between GEXIT (solid line) and EXIT (dashed line) 
functions when transmission takes place over the BSC. 



IV. Basic Properties of GEXIT Functions 

GEXIT functions fulfill the GAT by definition. Let us state 
a few more of their properties. 

We first show that the GEXIT function preserves the partial 
order implied by physical degradation. 

Lemma 4: Let X be chosen with probability from 
A"". Let the channel from X to y be memory less, where Yi 
is the result of passing Xi through the smooth and degraded 
family {M(ei)}e^, e li. \i X ^ K.,; ^ forms a Markov 
chain then 



dUiX, I Y) ^ dHiX, 



o - . • (10) 

Proof: Since the derivatives in Eq. ([TO) are known to 
exist a.e., the above statement is in fact equivalent to saying 



that, for any > e^, 

H{X, I r,(e^, K.,;) - H{X, I Y,{e,),Y^i) < 
H{X, I - H{X, I r,(e,), ■ 

Here, Yi{ei) and Yi{ei) are the result of transmitting Xi 
through the channels with parameter and e^, respectively. 
We claim that 

X ^ F,(e,;) ^ Y.ie'i), 

{Y,ie,),Y,{e',))^X ^{Y^,,<^,). 

The first claim follows from the assumption that the channel 
family is degraded and the second claim is also part of the 
assumption. Finally, the third claim is true since the channel 
is memoryless. 

The thesis is therefore a consequence of Lemma |5] stated 
below by making the following substitutions: 



Y 



Y', 
Z'. 



Lemma 5: Assume that X Y ^ Y\ X ^ Z Z', as 
well as (Y, Y') ^ X ^ (Z, Z') form Markov chains. Then 

H(x I y, Z) - H{X \Y,Z) < H{X I r', Z') - H{X I Y, Z') 

(11) 

Proof: The statement is equivalent to H{X \ Z,Y' , Z') — 

H{x I y, z, y, z') < H{x \ y, z') - h{x \ y, y, z'). Let 

us now condition on a event (Y' = y' , Z' = z'). The proof is 
completed by showing that (here the conditioning upon Y' = 
y' , Z' — z' is left implicit for the sake of simplicity) 

H{X \Y,Z)- H{X I y) - H{X I Z) + H{X) > . (12) 

This inequality can be written in terms of mutual information 
as I{Y] X\Z) < I{Y; X). The statement is therefore a well- 
known consequence of the data processing inequality, see [22, 
p. 33], if we can show that, conditioned on Y' ~ y' , Z' = z', 
Y X ^ Z forms a Markov chain. In formulae, we have 
to show that p{y,z\x,y',z') = p{y \ x,y' , z')p{z\x,y' , z'), 
which in turn follows if we can show that P^'^'y'f'l = I. 
The last equality can be shown by first applying Bayes law, 
then expanding all terms in the order x,z',y and y', further 
canceling common terms and, finally, repeatedly using the 
conditions that X ^ Y ^ Y', X ^ Z ^ Z', sls well as 
(y, Y')^ X ^ (Z, Z') form Markov chains. ■ 

In case of linear codes, and communication over a smooth 
and degraded family of BMS channels. Lemma |3] provides 
an exphcit representation of the GEXIT function in terms of 
L-densities. In this case Lemma 0] becomes a statement on 
the corresponding kernel. For completeness, let us state the 
corresponding condition explicitly. 

Corollary 1 (l'^i>Ms(h} (^2:) Preserves Partial Order): Consider 
a smooth and degraded family of BMS channels characterized 
by the associated family of L-densities {cB„s(ii)}h- Let a and 
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b denote two symmetric L-densities so that a -< b, i.e., b is 
physically degraded with respect to a. Then 



a(z);=™=^('')(z)dz < / b(z) (z)dz. 

_ 3 J —OC 

An alternative proof of this statement is provided in Appendix 

m 

We continue by examining some limiting cases. In the 
sequel iS denotes the error-probability operator. In the L- 
domain it is defined as £(a) = i a(2)e^(l^/^l"'"^/^'dz. 

Lemma 6 (Bounds for GEXIT Kernel): Let |c?|^™s(h) (2) be 
the kernel associated to a smooth degraded family of 
BMS channels characterized by their family of L-densities 
{cBMs(h)}h- Then 

1 - z < < 1. 

Therefore, if a is a symmetric L-density, we have 

/oo 
/^™s('')(z)a(z)dz < 1 . 

Proof: In Appendix HIl we show that (z) is 

non-increasing and concave. The upper bound follows from 
(z) < |d|'=B"s('')(z = 0) = 1. The lower bound is 
proved in a similar way by using concavity and observing 
that |d|''™=*('')(z = 1) = 0. The final claim now follows from 
the fact that the jZ?! -domain kernel associated to (£ is equal to 
(l-z)/2. ■ 

Lemma 7 (Further Properties of GEXIT Functions): 
Let g(h) be the GEXIT function associated to a proper 
binary linear code of minimum distance larger than 1, 
and transmission over a complete smooth family of BMS 
channels. Then 

5(0) =0, 5(1) -1. 

If the minimum distance of the code is larger than k, then 



ik-l 



dh'= 



-5(h) 



0. 



h=0 



Further, g(h) is a non-decreasing function in h. 

Proof: Consider the first two assertions. If h. = 0, then 
the associated L-density corresponds to a "delta at infinity" 
(this is an easy consequence of the minimum distance being 
at least 2). On the other hand, if h = 1 then the corresponding 
L-density is a "delta at zero." The claim in both cases follows 
now by a direct calculation. 

In order to prove the last claim, we use the definition of 
5(h) to write 



,k-l 



dh'^ 



T.9(h) 



1 d'' 
n dh*^ 



H{X\Y{^)) 



In order to evaluate the last derivative, we can first assume that 
the i-th bit is transmitted through a channel BMS(hi). Next 
we take partial derivatives with respect to k of the entropies 
{hi}. Finally we set = for all bits i. We get therefore 
(neglecting the factor l/n): 



E 



Qk 



dhi-^ ■ ■ ■ dhi^ 



H{X\Y) 



Of course hi can be set to right at the beginning for all 
the bits that are not differentiated over. This is equivalent to 
passing the exact bits Xi. We get the expression 

Qk 



■ah, 



-i7(x|r,,(h,j...y,jh,j,x. 



ii...%k 

to be evaluated at hi 



0. If the code has mini- 



mum distance larger than k, then any n — k bits determine the 
whole codeword and H{X\Yi^{hi^) . . .Yi^{\ii^), X^i^,,,i^) = 
0. This finishes the proof. ■ 

So far we have used the compact notation 5(h) for the 
GEXIT function. In some circumstance it is more convenient 
to use a notation that makes the dependence of the functional 
on the involved densities more explicit. 

Definition 4 (Alternative Notation for GEXIT Functional): 
Consider a binary linear code and transmission over a smooth 
family of BMS channels characterized by the associated 
family of L-densities {Cjje. Let {a,:},: denote the associated 
family of average extrinsic MAP densities (which we assume 
smooth). Define 



G(c,,a, 



a,(z)r»(z)dz, 



where 



l^'{z) 



rOO 
J — C 



^log(l + e- 



Lemma 8 (GEXIT and Dual GEXIT Function): Consider a 
binary code C and transmission over a complete and smooth 
family of BMS channels characterized by the associated 
family of L-densities {ce}c. Let {a^}^ denote the corre- 
sponding family of (average) extrinsic MAP densities. Then 
the standard GEXIT curve is given in parametric form by 
{iJ(ce), G'(ce, ae)}. The dual GEXIT curve is defined by 
{G(ae, Ce), i?(a(:)}. Both, standard and dual GEXIT curve 
have an area equal to r(C), the rate of the code. 
Discussion: Note that both curves are "comparable" in that 
the first component measures the channel c and the second 
argument measure the MAP density a. The difference between 
the two lies in the choice of measure which is applied to each 
component. 

Proof: The statement that {H{ce),G{c^,a^)} represents 
the standard GEXIT function follows by unwinding the corre- 
sponding definitions. The only statement that requires a proof 
is the one concerning the area under the "dual GEXIT" curve. 
We proceed as follows: Consider the entropy H{ce * a^)- We 
have 

H{ce*ae)— / ( / Cc(w)a(:(-(; — w)dw j log(l + e^")dw 
J —00 J —00 

00 />oo 

/ Ce(u;)a,(z)log(l + e""'"'^)dwdz. 

-00 J — oo 

Consider now j'/^^'' ■ Using the previous representation we 
get 

di?(c,*a,) [°° dc,{w) 



de 



de 



-ae(z)log(l + e" 



, , dap(z) , 
de 



'^)dti;dz- 
^)dti;dz. 
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The first expression can be identified with the standard GEXIT 
curve except that it is parameterized by a generic parameter 
e. The second expression is essentially the same, but the 
roles of the two densities are exchanged. Integrate now this 
relationship over the whole range of e and assume that this 
range goes from "perfect" (channel) to "useless". The integral 
on the left clearly equals 1. To perform the integrals on 
the right, reparameterize the first expression with respect 
to h = c^{w)\og{l + e~"')du' so that the integral is 
equal to the area under the standard GEXIT curve given 
by {7?(ce), G(ce, ae)}. In the same manner, reparameterize 
the second expression by h = ae{w)\og{l + e^'^)dw. 
Therefore the value of second expression is equal the area 
under the curve given by {iJ(ae), G(ae, Cc)}. Since the sum 
of the two areas equals one and the area under the standard 
GEXIT curve equals r{C), it follows that the area under 
the second curve equals 1 — r(C). Finally, note that if we 
consider the inverse of the second curve by exchanging the two 
coordinates, i.e., if we consider the curve {0(3^,0^), H{a^)}, 
then the area under this curve is equal to 1 — (1 — r{C)) = 
r{C), as claimed. ■ 
Example 15 (GEXIT Versus Dual GEXIT): Fig. |6l shows 
the standard GEXIT function and the dual GEXIT function 
for the [5, 4, 2] code and transmission over the BSC. Although 
the two curves have quite distinct shapes, the area under the 
two curves is the same. 



Fig. 6. Standard and dual GEXIT function of [5, 4, 2] code and transmission 
over tlie BSC. 



V. Ensembles: Concentration and Asymptotic 
Setting 

For simple codes, like, e.g., single parity-check codes or 
repetition codes, h and g are relatively easy to compute. In 
general though it is not a trivial matter to determine the density 
of <I>j required for the calculation. What we can typically 
compute are the extrinsic estimates if we use the BP decoder 
instead of the MAP decoder. It is therefore natural to look at 
the equivalent of EXIT and GEXIT functions if we substitute 
the extrinsic MAP estimates by their equivalent extrinsic BP 
estimates. Although most of the subsequent definitions and 
statements can be as easily derived for EXIT as for GEXIT 
functions, we focus on the latter. After all, these are the natural 
objects to study as suggested by the GAT. 

Definition 5 (g"'' for Linear Codes and BMS Channels): 
Let X be chosen uniformly at random from a proper binary 
linear code. Let the channel from X to y be memory less, 
where Yi is the result of passing Xi through the smooth 
family {BMS(hj)}hi, hj € [0,1]. Assume that all individual 



channels are parameterized in a smooth way by a common 
parameter e, i.e., = hi(e), i G [n\. Let ^"""'^ denote 
the extrinsic estimate of the bit at the round of BP 
decoding, assuming an arbitrary but fixed representation of 
the code by a Tanner graph as well as an arbitrary but fixed 
schedule of the decoder. Then the BP GEXIT function is 
defined as 

^' dh, de 

The following statement, which is a direct consequence of 

the previous definition and Lemma |3 confirms the intuitive 

fact that the BP GEXIT function (which is associated to 

the suboptimal BP decoder) is at least as large as the the 

GEXIT function itself, assuming only that the channel family 

is degraded. 

Corollary 2 (GEXIT Versus BP GEXIT): Let X be chosen 
uniformly at random from a proper binary linear code. Let 
the channel from X to y be memoryless, where Yi is the 
result of passing Xi through a smooth and degraded family 
{BMS(hi)}ii,, hj G [0, 1]. Assume that all individual channels 
are parameterized in a smooth (differentiable) way by a 
common parameter e, i.e., hi = hj(e), i G [n]. Let ^^(e) and 
9i'^\^) defined in Definitions |3] and |2| Then 

Definition 6 (Asymptotic BP EXIT and GEXIT Functions): 
Consider a dd pair (A, p) and the corresponding sequence of 
ensembles LDPC(n, A, p). Further consider a smooth and 
degraded family {BMS(h)}h. Assume that all bits of X are 
sent through the channel BMS(h). For G e LDPC(n, A,/9) 
and i G [n], let 5i(G,e) and g^'^G^e) denote the MAP 
and BP GEXIT function associated to code G. By some abuse 
of notation, define the asymptotic (and average) quantities 

.g(h) ^limsupEji ^ 5,(G,h)j, 
5--^(h)4 limEji5:5r^(G,h)", 

n^oo In 

ie[n] 

.g-(h) 4 Ihn .g--^(h). 

For notational simplicity we suppress the dependence of 
the above quantities on the dd pair and the channel family 
{BMS(h)}h. 

In the above definitions we have taken the average of the 
individual curves over the ensemble. Let us now justify this 
approach by showing that the quantities are concentrated. The 
proof of the following statement, which asserts the concentra- 
tion of the conditional entropy, can be found in [5]. 

Theorem 2 ( Concentration of Conditional Entropy): Let 
G(n) be chosen uniformly at random from LDPC(r7,, A, p). 
Assume that G(n) is used to transmit over a BMS(h) channel. 
By some abuse of notation, let -ffG(?i) — ^^g(ti) {X \ Y) be the 
associated conditional entropy. Then for any ^ > 

Pr{|i/G(n) -IEG(„)[i?G(„)]| > <} < 2e-"^«', 
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1)^(1 — r)) and rmax is the maximal 



the 

(A,P) 
Umits 
and 
G{n) be 
Assume 



where B = 1/(2 (rn 
check-node degree. 

Theorem 3 ( Concentration of g"'''^): Consider 
sequence of ensembles LDPC(n, A,/?), where 
is fixed and n tends to infinity. Then the 
.9->^(h) = lim„^o,n-iEG[E.eM5r' 
.g'"'(h) = lim£^oog'"''^(li) exist. Further, 
chosen uniformly at random from LDPC(n, A,/?) 



(G,h)] 
let 



that G{n) is used to transmit over a BMS(h) channel. Then, 
for all ^ > 0, there exists > 0, such that, for n large 
enough 



Pr 



■•{ 



> n 



< e" 



(13) 



Proof: Note that for a fixed iteration number i, the 
distribution of (with i a uniformly random node), assuming 
that the all-one codeword was sent, converges (at a speed of 
1/n) to the corresponding distribution of density evolution, 
denote it by a^. The result now follows by noting that g^^'^ 
is the result of applying a bounded linear operator to this 
distribution . The proof of concentration is almost verbatimly 
the same as the proof in [1 1], which shows the concentration of 
the probability of error under BP decoding, or the proof in [5], 
which relates to the concentration of the BP EXIT function. 
We will therefore skip the details. ■ 
Theorem 4 ( Concentration of g ): Let G be chosen uni- 
formly at random from LDPC(n, A,/?) and consider the 
smooth and degraded family {BMS(h)}h, h G [0, 1]. Assume 
that G is used to transmit over the BMS(h) channel. Let 
-ffG(ri) — ^G(n) I ^) the associatcd conditional entropy, 
g(G(n),h) the corresponding MAP GEXIT function, and 
.g„(h) = E g(G(ri),h). Let J C [0, 1] be an interval on which 
lim„^oo [-ffG(ri)] exists and is differentiable with respect 
to h. Then, for any e G J and ^ > there exist an > 
such that, for n large enough 

Pr{|5(G(n),h)-g„(h)| ><}<e-""^ 

Furthermore, if lim„_>oo [^^G(n)] is twice differentiable 
with respect to h G J, there exists a strictly positive constant 
A such that > A^''. 

The proof of this statement can be found in [5]. 

Let us summarize. We have seen that all the quantities which 
we introduced in Definition |6l are concentrated and that the 
BP quantities g^^'^ and g""" exist. Unfortunately, we had to use 
lim sup for the definition of g since to prove the existence of 
the limit seems to be difficult. As discussed in [5], even in 
the case of transmission over the BEC the existence of the 
corresponding limit is not known in general but only follows 
from the explicit construction of the Maxwell decoder in all 
those cases where the Maxwell construction can be shown to 
result in MAP performance. 

Note that g""" *^ and g^^ have again a convenient representa- 
tion in terms of the asymptotic BP densities. More precisely, 
we have 



— oo 
oo 



where a"""' is the limiting density of (with i a uniformly 
random node) under the all-one codeword assumption as n 
tends to infinity associated to the dd pair (A, p). This density 
can easily be computed by density evolution. In a similar 
manner, a'''' is the corresponding fixed-point density of density 
evolution. 

In Fig. |2] we plot the BP GEXIT function g""" for a few 
regular LDPC ensembles and we compare them with the 
corresponding BP EXIT functions, which we denote by /i""". 
We see that the curves are quite similar. 




G.2 G.4 0.6 



G.8 1.0 0-0 



0.2 0.4 0.6 0.8 1.0 



Fig. 7. BP GEXIT (solid curves) versus BP EXIT (dashed curves) for several 
regular LDPC ensembles for the BSC (left picture) and the BAWGNC (right 
picture). 

Lemma 9 (g < g"''): Consider a dd pair (A, p) and trans- 
mission over the smooth and degraded family {BMS(h)}h. 
Let g(h) and .g'"'(h) denote respectively the corresponding 
asymptotic MAP and BP GEXIT functions as defined in 
Definition |6l when the code is chosen uniformly at random 
from the ensemble LDPC(n, A, p). Then 

g(h)<.r(h)- 

Proof: Using Corollary |2] we know that for any G g 
LDPC(n, A,p) and ^ e N 

,9G(e) < .9r''(e)- 

If we take first the expectation over the elements of the 
ensemble, then the lim sup on both sides with respect to n, 
and finally the limit £ oo, we get the desired result. 



VI. An Upper Bound on the MAP Threshold 

One important consequence of the area theorem is that it 
gives rise to an easy to compute upper bound on the threshold 
of MAP decoding. 

Definition 7 (MAP Threshold): Consider a dd pair {X, p) 
and a smooth and degraded family {BMS(h)}h. The threshold 
h"" is defined as 

j^MAP 4 ^ ^ . liminf EG[iJ(X I Y{h))]/n > 0}. 

n — >oo 

Discussion: Let us consider the operational meaning of 
the above definition. Let h < h"". Then by definition 
of the threshold, there exists a sequence of blocklengths 
ni, 712, ns, • • • , so that the normalized (divided by the block- 
length ii) average conditional entropy converges to zero. By 
Theorem|2]it follows that most of the codes in the correspond- 
ing ensembles have a normalized conditional entropy less 



12 



than any fixed constant. For sufficiently large blocklengths, a 
conditional entropy which grows sublinearly implies that the 
receiver can limit the set of hypothesis to a subexponential 
list which with high probability contains the correct codeword. 
Therefore, in this sense reliable communication is possible. 

On the other hand, assume that h > h"*"". In this case the 
normalized conditional entropy stays bounded away from zero 
by a strictly positive constant for all sufficiently large block- 
lengths. By Theorem 121 this is not only true for the average 
over the ensemble but for most elements from the ensemble. 
It follows that with most elements from the ensemble reliable 
communication is not possible. 

Theorem 5 (Upper Bound on MAP Threshold): Consider a 
dd pair (A, p) whose asymptotic rate converges to the de- 
sign rate r{X,p), see [5, Lemma 7]. Assume further that 
transmission takes place over a smooth and degraded family 
{BMS(li)}h. Let g''''(h) denote the associated BP GEXIT 
function. Then 

\imMEG[H{X\Y{h))]/n>r{\,p)- [ dh' . 

(14) 

Furthermore, if h denotes the largest positive number so that 



g-(h) dh = r(A,p), 



then h"-"" < h, where h"*'' denotes the MAP threshold. 

Proof: Let G be chosen uniformly at random from the 
ensemble LDPC(?i, X,p). By the GAT 

r(A, p)- liminf Es[i/(X | Y{h))]/n = 

n — »cx3 

= Km sup - Eg[H{X I - H{X \ Y{h))] = 



lim sup Eg 



5(G,h') dh' 



We can exchange the expectation and the integral by Fubini's 
theorem; in fact g(G,h') is measurable and g(G,h') e [0,1]. 
We can furthermore exchange the limit and the integral by the 
Fatou-Lebesgue lemma. We get 

limMEa[H{X \Y{h.))]/n > r{\,p) - [ .g(h') dh' . 

Equation is proved by applying Lemma |9] 

The upper bound on the MAP threshold follows from the 
observation that the r.h.s. of Eq. il4\ is non-decreasing in h. 
Therefore \imsu'Pj^^^Ec,[H{X \Y{h))]/n is bounded away 
from for any h > h and the thesis follows from the definition 
of h"". ■ 
Example 16: The following table presents the upper bounds 
on the MAP threshold for transmission over the BAWGNC(h) 
as derived from Theorem [S] for a few regular ensembles: 
X{x) ~ x-"-^^, p{x) = x'^^^. The same threshold were 
first computed using the (non-rigorous) replica method from 
statistical physics [27]. In [28], they were shown to be upper 
bounds for r even, using an interpolation technique. The 
present proof applies also to the case of odd r. It can be 
proved that the three characterizations of the threshold are 
indeed equivalent, i.e., they give exactly the same value. 



"3 4 0.6507(5) 

3 5 0.5113(5) 

3 6 0.4160(5) 

4 6 0.5203(5) 



0.7417(1) 
0.5800(3) 
0.4721(5) 
0.6636(2) 



h([29], [30]) h^' 



0.743231 
0.583578 
0.476728 
0.663679 



"374" 

3/5 
1/2 
1/3 



Also shown is the result of the information theoretic upper 
bound given in [29], which in turn is an improved version 
of the bound developed in [30]. For the specific case of 
transmission over the BSC and regular codes it is given by 
/12(e), where e is the unique positive root of the equation 
r/i2(e) = l/i2((l-(l-2e)^)/2). 

Vll. The Extended BP GEXIT Curve 
A. Extended BP GEXIT Curve 

As discussed in detail in [5] for the case of transmission 
over the BEC, the fundamental relationship which appears in 
the limit of large blocklengths between the MAP and the BP 
decoder is best described in terms of the extended EXIT curve. 
For the BEC this is the curve with parametric description 

A(i-p(i-x)) ' ^ ^ ^)))' where x takes values in the 
subset J C [0, 1] such that x < A(l — — x)) (J is in fact the 
union of a finite number of intervals). Note that the families 
{fj. ^ {BEC(x)}, and {cj, 4 {BEC( 
X e J, have the following property: For each x e j, fx 
constitutes a fixed-point density (of density evolution) for the 
channel Cx. Furthermore both channel families are smooth and 
satisfy i?(fx) = x. Finally if J = [0, 1] (a necessary condition 
for this to happen is A'(0)p'(l) > 1) the families are said to 
be complete. 

Definition 8 ( Complete Fixed-Point Family, g'^"'' and g"''): 
Consider a degree distribution pair (A,p). We say that 
the families {fx}x and {cx}x, x e [0, 1], form a complete 
fixed-point fizmily for (A, p) if 

(i) there exists a complete and degraded family {BMS(h)}h 
such that for each x e [0, 1], Cx £ {BMS(h)}h 

(ii) for each x G [0, 1], fx is a fixed-point density with respect 
to the degree distribution (A, p) and the channel c^; this 
means that for each x € [0, 1], fx = Cx * X{p{fx)) 

(iii) {fx}x and {Cx}x are smooth with respect to x 

(iv) iJ(fx) = X 

Let a^{y) = A(p(fx)). The extended BP (EBP) GEXIT 
curve, call it g™''(x), is then given in parametric form by 

{H{x),g'''"-{x)), where 



Finally, the BP GEXIT curve, call it g^^, is the "envelope" of 
the g^^^ curve. 

Discussion: Contrary to our usual notation, we have used x to 
parameterize the channel families and the function g'^^^{x) and 
we have assumed that -ff(fx) = x (rather than H{c^) = x). 
This has the following reason: in general, the EBP GEXIT 
function is not a single-valued function of the channel entropy 
but it is a single-valued function of the fixed-point entropy (see 
Fig. 0. We prefer to use the parameter x instead of the usual 
parameter h, to remind ourselves that the channel Cx is the 
channel which belongs to the fixed-point density fx (and not 
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the channel Ch, which by our previous notational convention 
has entropy h). Complete fixed-point families do not always 
exist. If, for instance, A2 = 0, then x cannot be chosen 
arbitrarily close to 0. This is easily seen for transmission 
over the BEC. In this case x > x with x the smallest (non- 
vanishing) root of the equation A(l — p{l — x)) = x. 

From the definition it is not immediately obvious that for 
a given degree distribution pair (A, p) and a complete and 
degraded family {BMS(h)}h, such a (complete or incomplete) 
fixed-point family always exists, or that it is unique. For the 
BEC we have an explicit formula for the family, but in the 
general case the existence is far from trivial. We will get back 
to this point in the next section. 

One of the important applications of the EBP GEXIT 
curve is that it encodes very clearly the connection between 
MAP and BP decoding. As mentioned above, the BP GEXIT 
function is obtained as the 'envelope' of the EBP curve. More 
precisely, one has to choose, for each value of the channel 
entropy h, the branch of the EBP curve whose GEXIT value is 
the largest. As pointed out in the introduction when discussing 
Fig. [2 a different single valued function can be obtained by 
applying the Maxwell construction, described in detail in [5], 
to the EBP GEXIT curve. Motivated by the GAT as well as 
by the BEC case, we formulate the following 

Conjecture 1: The (MAP) GEXIT function (7(h) is obtained 
by applying the Maxwell construction to the extended BP 
GEXIT curve (i/(x), /■"'(x)). 

Let us consider a few typical examples. In each of the 
following cases the complete fixed-point family was computed 
by a numerical procedure, which will be explained in the next 
section. 

Example 17 (LDPC(x, x^) - BSC): Consider the 
dd pair {X, p) — {x,x^) and the corresponding LDPC 
ensemble with design rate r = 2/3. We assume that 
transmission takes place over the family {BSC(e)}. Recall 
that for this code the BP threshold is given by the stability 
condition. From Fig.|8]we see that, according to the numerical 
calculation, the EBP GEXIT curve is a monotone function. 
Assuming this is true, it follows that the EBP GEXIT is 
equal to the BP GEXIT curve for this example. For any value 
of the channel parameter a single fixed point density (apart 
from the 'delta at infinity') is found. Also: a single fixed 
point density exists for each value of the density entropy x. 
The Maxwell construction is trivial in this case and yields a 
MAP GEXIT equal to the BP GEXIT curve. 

Example 18 ((3,6) LDPC Ensemble - BSC): Consider the 
dd pair (A, p) — (a;^, x^) and the corresponding LDPC ensem- 
ble with design rate r = 1/2. We assume that transmission 
takes place over the family {BSC(e)}. Fig. |9] shows on the 
left the EBP GEXIT curve and the corresponding BP GEXIT 
curve, which has one jump. The picture on the right shows 
the conjectured MAP GEXIT curve according to the Maxwell 
construction. For this ensemble, we have h""" « 0.416. The 
MAP threshold implied by the Maxwell construction coincides 
with the one of Theorem m E"*' « 0.472. 

Example 19 (LDPC{2/5x + S/5x^,x^) - BSC): Consider 
the dd pair (A, p) = (2/5a; + 3/5a;^, x^) and the corresponding 
LDPC ensemble with design rate r = 4/9. We assume that 
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Fig. 8. EBP GEXIT curve for the cycle-code ensemble with dd pair {x, x^). 
The EBP GEXIT curve, BP GEXIT curve and MAP GEXIT curve coincide. 




0-0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0,6 O.S 1.0 

Fig. 9. EBP GEXIT curve for the (3, 6) ensemble. Left: EBP GEXIT curve 
and corresponding BP GEXIT curve. Right: The conjectured MAP GEXIT 
curve according to the Maxwell construction. 

transmission takes place over the family {BSC(e)}. Fig. ^01 
shows on the left the EBP GEXIT curve and the corresponding 
BP GEXIT curve, which has one jump. The picture on the 
right shows the conjectured MAP GEXIT curve according to 
the Maxwell construction. The BP threshold is given by the 
stability condition. As a consequence of this and Conjecture 
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Fig. 10. EBP GEXIT curve for the (A,p) = {2/5x + 3/5x^ ,x^) ensemble. 
Left: EBP GEXIT curve and corresponding BP GEXIT curve. Right: The 
conjectured MAP GEXIT curve according to the Maxwell construction. 

Example 20 (LDPCf ^^+^^^'+"^'' , x^) - BSC): Consider 
the dd pair (^ 3x+6x^+iix ^^9y assume that transmission 
takes place over the family {BSC(e)}. Fig. II II shows on the 
left the EBP GEXIT curve and the corresponding BP GEXIT 
curve, which has two jumps. The picture on the right shows 
the conjectured MAP GEXIT curve according to the Maxwell 
construction: This curve has also 2 jumps. 
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Fig. 1 1 . EBP GEXIT curve for the dd pair ( ^""^^fj"""" Left: EBP 

GEXIT curve and corresponding BP GEXIT curve. Right: The conjectured 
MAP GEXIT curve according to the Maxwell construction. 



Example 21 ({ 



■,x^) - BSC): Consider 



the 



dd pair ^ ' and the corresponding LDPC 

ensemble. . We assume that transmission takes place over the 
family {BSC(e)}. Fig. [HI shows on the left the EBP GEXIT 
curve and the corresponding BP GEXIT curve, which has 
two jumps. The picture on the right shows the conjectured 
MAP GEXIT curve according to the Maxwell construction. 
This example shows that a dd pair can have more BP jumps 
than MAP jumps. 
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Fig. 12. EBP GEXIT curve for the dd pair (£±2£^t2^, a;5). Left: EBP 
GEXIT curve and corresponding BP GEXIT curve. Right: The conjectured 
MAP GEXIT curve according to the Maxwell construction. 



VIII. How TO COMPUTE EBP GEXIT Curves: Basic 
Properties and Area Theorem 

In the previous pages we presented examples of EBP 
GEXIT curves for several LDPC ensembles. In this section 
we explain how these curves have been computed and we 
derive some of their basic properties, including the EBP Area 
Theorem. 

We start by noticing that ordinary density evolution cannot 
be applied to the present case because of two reasons. First, 
EBP curves include 'unstable branches'. We refer by such a 
term to branches along which the GEXIT curve is a decreasing 
function of the channel entropy. Such branches are expected to 
correspond to fixed point densities which are locally unstable 
under density evolution (whence the name). This expecta- 
tion can be confirmed analytically for the BEC case, and 
numerically for a general BMS channel. As a consequence, 
these fixed points cannot be approximated by iterating density 
evolution with a generic initial condition. 



The second problem is related to values of the channel pa- 
rameter for which multiple locally stable fixed point densities 
coexist. This is the case for instance in the Examples ^] to 
1^ above. In this case different initial conditions are required 
to achieve each of these densities by density evolution. A 
systematic way for constructing all such initial conditions is 
however not available. 

The crucial observation for overcoming both these problems 
consists in noticing that EBP GEXIT curves are naturally 
parameterized by the entropy of the fixed point density. More 
precisely, consider a smooth and degraded family {BMS(h)} 
and X e [0, 1]. Then, we expect that there exists at most one 
value of the channel parameter h = h(x) and one density fx, 
such that iJ(fx) = X and (cx = BMS(h(x)), fx) forms a fixed 
point pair. 

This naturally suggests to run density evolution at fixed 
density entropy. Let us denote by the ordinary density 
evolution operator at fixed channel BMS(h). Formally 



Th(a) 4c*A(p(a)). 



(15) 



where c is the density associated to the channel BMS(h). For 
any x e [0, 1], we define the density evolution operator at fixed 
entropy x, i?x as 



-Rx(a) — Ih(a^x)(a) 



(16) 



where h(a,x) is the solution of 7J(rh(a)) = x. Whenever no 
such value of h exists, -Rx(a) is left undefined. Since, for a 
given a, the family Th(a) is ordered by physical degradation, 
H(T]^{a)) is a non decreasing function of h. As a consequence 
the equation iJ(Th(a)) = x cannot have more than a single so- 
lution. Furthermore, by the smoothness of the channel family 
BMS(h), i^(rh(a)) is continuous. Notice that H{Taia)) = 0: 
if the channel is noiseless the output density at a variable 
nodes is noiseless as well. Therefore, a necessary and sufficient 
condition for a solution h(a,x) to exist (when the family 
{BMS(h)}h is complete) is that iJ(Ti (a)) = i?(A(p(a))) > x. 

Any fixed point of the above transformation i?x, i.e. any f 
such that f = i?x(f)' is also a fixed point of ordinary density 
evolution for the channel BMS(h) with h = h(f,x), and 
corresponds to a point on the EBP GEXIT curve. Furthermore 
if a sequence of densities such that a^+i = i?x(af) converges 
(weakly) to a density f, then f is a fixed point of i?x, with 
entropy x. 

This motivates the following numerical procedure which 
has been used to determine the GEXIT curves plotted in the 
previous section, (i) Set the initial condition ao = BMS(x). 
(m) For ^ > compute a^+i = R^{ag). In practice the 
convolutions are evaluated numerically either by sampling or, 
via Fourier transforms as in ordinary density evolution. Due to 
the monotonicity of iJ(Th(a^)) in h, the value of h(af , x) can 
be efficiently found by bisection, (iii) The current estimate of 
the GEXIT function is given by (hf,^™''). Here hi = h(a^,x) 
is the current estimate of the channel entropy, and 



EBP 

9e 



(17) 



with hi = A(/j(a^)). (iv) Halt when some convergence 
criterion is met and return the current estimate {hfjgf^). In 
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practice one can require that (a properly defined) distance 
between ag and a^+i becomes smaller than a threshold. 

In all the examples discussed in the previous section, we 
found that this procedure converges rapidly, and that the limit 
point is (within numerical precision) independent of the initial 
condition ao- Proving these statements seems a challenging 
task (notice that unlike in ordinary density evolution, the se- 
quence {a^} is in general not ordered by physical degradation). 
However it is easy to show that, if x is such that is 'well 
defined', then this procedure has at least one fixed point. 

Theorem 6: Let (A, p) be a dd pair , x S [0, 1], and _Rx 
the corresponding density evolution operator at fixed density 
entropy defined as above, for the smooth, complete and 
degraded family {BMS(h)}h. If iJ(A(p(a))) > x for any 
density a with H{a) ~ x, then there exists at least one density 
f such that i?x(f) = f- Equivalently, H{f) = x and there exists 
h € [0, 1] such that f is a fixed point of density evolution for 
the channel BMS(h). 

Proof: Consider the space Sx of i-densities a such that 
H{a) = X. Any element in Sx is a probability measure on 
the completed real line, satisfying the symmetry condition 
(formally a (—a;) = e^'^a(.T)). Vice versa, any such probability 
measure (to be denoted formally by its 'density' a) with 
E[log(l + e~^)] = x) corresponds to a unique element of Sx. 
Notice that the completed linear line Koc is a compact metric 
space (we can for instance identify it with [—1,1] through 
the mapping x i-^ tanh(x/2) and use the euclidean metric 
on [—1,1]). Therefore, the space of probability measure on 
Koo is sub-sequentially compact under the weak topology by 
Prohorov's theorem [31]. Both the symmetry condition and 
H{a) = X are closed under the same topology, and therefore 
Sx is compact as well. 

Let BL be the space of bounded Lipshitz function on Roo 
(as above, we identify Roo with [—1,1] and consider the 
Lipschitz condition with respect to the induced distance) with 
the corresponding norm || • ||bl- The space of probability 
measures on Roo can be viewed as a convex subset of the 
dual space BL*, and the topology induced by the dual norm 
II • llgL coincides with the weak topology (cf. [31, Chapter III, 
§7]). As a consequence Sx is a compact convex subspace of a 
normed linear space. 

By hypothesis the mapping a i-^ i?x(a), is well defined for 
any a e Sx, and maps Sx into itself. Furthermore, it is easily 
seen to be continuous with respect to the weak topology. This 
is a consequence of the Lipschitz continuity of the functions 
(.Ti,...,xi) (a;i + • • • + .Ti) and (cci, . . . , Xr-i) 
2 atanh(tanh(a;i/2) • • • tanh(.Tj-_i/2)). Therefore i?x is com- 
pact and, by Schauder's fixed point theorem (cf. [32], Chapter 
4) it has at least one fixed point. ■ 
Notice that the above procedure, as well as Theorem |6l holds 
unchanged if the entropy functional H{-) is substituted by 
any continuous linear functional which preserves physical 
degradation. 

In checking the hypothesis of Theorem|6l as well as in appli- 
cations, it is important to prove bounds on the entropy of fixed 
point pairs (f , c). We start by recalling upper and lower bounds 
on the entropy of Tii(a) which follows straightforwardly from 
[23]-[25], [33]. 



Lemma 10 (Lower Bound): Consider a dd pair {X, p) and 
transmission over the channel BMS(h). Let 

. A . ^ ^ A , ^l-(l-2e(x))-i~ 

l(x) = A(x), r{x) = 2_^p,h2' 

i 

where e(x) = /i|^^(x). If a is an L-density with H{a) = x, 
then 

i/(T,(a))>hi(r(x)). 
Proof: Following Refs. [23]-[25], [33], for fixed H{a) 
and H{b), a -k h has minimum entropy if a and b are the 
densities corresponding to a BEC. On the other hand, for the 
convolution at a parity-check node the minimum is achieved 
when the input densities correspond to a BSC. The lemma 
follows by applying these bounds to random variable and 
check nodes with degree distributions given by A and p. ■ 
This result can be used to check the hypotheses of Theorem|6l 
We deduce that, if l{r_{-x.)) > x for some x e [0, 1], then there 
exists a fixed point pair (f, c) with H{f) ~ x and c = BMS(h) 
for some h. For instance, for cycle codes (i.e., for A(a;) = x) 
this implies that such a fixed point pair (f , c) exists for any 

if(f) = xe [0,1]. 

Lemma 11 (Upper Bound): Consider a dd pair [X, p) and 
transmission over the channel BMS(h). Let 



I(h,x)4^A,/,_i(h,x) 



l-p(l-x) 



where 



fce{±i} j=o 



/.(h,x)4 ^ ^(M(l-6(x))^e(x)-^a..(h) 



■log2 1 



£(x)^-'-'a_fc(h) 
(l-e(x))2j-afc(h)y ' 



a+i(h) = 1 - e(h), a_i(h) = e(h), and e(h) = h^^{h) as 
above. If a is an L-density with H{a) = x, then 

i^(Th(a)) <I(h,r(x)). 
Proof: Apply the upper bounds of [23]-[25], [33] (simply 
interchange BEC and BSC). ■ 
Theorem 7 (Bounds on EXIT Function): Consider a 
dd pair (A, p) and transmission over the degraded family 
{BMS(h)}h. Define the functions 

i(x)4A(x), I(x)^^AJ,(l,x), 



and /(x,x') = max{h : I(h, x') ~ x} (with the convention 
/(x, x') = 0, if the set is empty). Let f denote any fixed point 
of density evolution, i.e., f = Th(f). If i^(f) = x then 

/(x,r(x))<h<x//(r:(x)), 
^tW) < /i""" <I(r(x)). 

In words, the entropy parameters of any fixed points of 
density evolution, and so in particular the function /i™"", are 
contained in the union of rectangles as given above. 

Proof: The first two inequality follow from Lemma [Tol 
and [11] From Lemma [Tol we get x = i/(f) = H{T^{^)) > 
h{(r(x)) which gives the upper bound on h. Analogously, 
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Lemma fTTI implies x > 7(h, r(x)). Since r(x)) is mono- 
tonically increasing in h, this relation can be inverted as in the 
thesis of the theorem. 

Given the fixed point f, the corresponding EXIT entropy at 
variable nodes is /i™'' = H{L{p{f))). The bounds are obtained 
as in the proofs of Lemmas ^| and ^2 ■ 
Discussion: The bounds given above are by no means best 
possible. First, the given bounds are "universal" in the sense 
that the are valid for all channel distributions. Better bounds 
for any specific channel family can be derived by taking the 
actual input distribution into account. Even in the universal 
case slightly better bounds can be given by taking into 
account that at the variable node before convolution with 
the channel, the incoming message density can not be of 
arbitrary shape but that it is already the convolution of several 
message densities. Second, tighter bounds on the extremes of 
information combining have been derived in [34] and can be 
translated to giver tighter bounds on EXIT functions, albeit 
at the prize of more complex expressions. Finally, by using 
a similar techniques one can also give bounds on the entropy 
versus GEXIT parameter of any fixed point with respect to 
any smooth channel family. 

Example 22 (LDPC{2/5x + 3/5x^,x^)): Consider again 
the dd pair {\, p) = {2/5x + 3/5.T^x5). Fig. [13] shows 
on the left the construction of the bounded region (union 
of rectangles) which contains all EBP GEXIT curves. The 
dashed lines represent the individual curves traced out by the 
corner points of the rectangles. On the right this is compared 
to the actual EBP GEXIT curves for transmission over the 
BSC and the BEC families (solid lines). 
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Fig. 13. Left: Construction of bounding region for all EBP EXIT curves 
for the dd pair (A, p) = {2/5x + 3/5a;5, x^). Right: The EBP EXIT curves 
for transmission over the BSC and the BEC families. 



Theorem 8 (EPP Area Theorem): Consider the 
dd pair (A, p) and transmission over the smooth and degraded 
family {BMS(h)}. Let g™*" denote the corresponding EBP 
GEXIT function. Assume that the corresponding {fx}x and 
{cx}x, X G [0, 1], form a complete fixed-point family. Then 



/'"■(x)dx = 1 - 



/A- 

Proof: First, let us assume that the ensemble is (l,r)- 
regular Consider a variable node and the corresponding com- 
putation tree of depth one as shown in Fig.^J Let us assume 
that the bit associated to the root node is passed through 
the channel characterized by Cx, while the ones associated to 




Fig. 14. 
ensemble. 



leaves 



Computation tree of depth one for the (2, 4) -regular LDPC 



the leaf nodes are passed through a channel characterized by 
fx. Apply the GAT: let X = (Xi, . . . , Xi+ix(r-i)) be the 
transmitted codeword chosen uniformly at random from the 
tree code and y(x) be the result of passing the bits of X 
through their respective channels with parameter x. Note that 
H{X I y(x = 1)) - H{X I y(x = 0)) = H{X). This follows 
since by assumption the fixed-point family is complete. In 
particular this implies that the channel for x = is the 
"noiseless" channel so that H{X\Y{-x. = 0)) = 0. By the 
GAT, this difference is equal to the sum of the integrals of the 
individual g.^ curves, where the integral extends from x = 
to X = 1. There are two types of individual g.^ curves, namely 
the one associated to the root node, call it g^, and the l(r — 1) 
ones associated to the leaf nodes, call them g^. To summarize, 
the GAT states 

H{X)= [ .g,(x)dx + l(r-l) / .g|(x)dx. 
Ja Jo 

Note that H{X) = 1 + l(r - 1) - 1 = 1 - l(r - 2) since 

the computati on tree contains 1 + l(r — 1) variable nodes 

and 1 check nodes. Moreover, g^{x)dx ~ 1 ~ p{l — 

x)dx = (r — l)/r. This follows by applying the GAT once 

again to a [r, l,r — 1] single parity check code. Collecting 

these observations and solving for J^^ gri^) dx, we get 

»i 

5r(x) dx = 1 - 1/r = r, 



as claimed since g,. = 5™"". 

The irregular case follows in the same manner: we consider 
the ensemble of computation trees of depth one where the 
degree of the root note is chosen according to the node 
degree distribution A and each edge emanating from this root 
node is connected to a check node whose degree is chosen 
according to the edge degree distribution p. As before, leaf 
nodes experience the channel characterized by f^i-, whereas 
the root node experiences the channel characterized by c^;. 
We apply the GAT to each such choice and average with the 
respective probabilities. ■ 

This result imposes some strong constraint on BP GEXIT 
functions and their relation to MAP GEXIT functions. Here 
is an example. 

Corollary 3: Consider communication over the smooth and 
degraded family {BMS(h)}h, h G [0,1] using uniformly 
random codes from the ensemble LDPC(77,, X, p) and assume 
that the rate of this ensemble converges to the design rate, 
see [5, Lemma 7]. Assume that the BP fixed point family 
{BMS(h),ah}, is smooth and complete. Then (MAP) GEXIT 
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>- f^, where a 



BP.£ 



function and BP GEXIT function coincide: g{h) = .g'"'(h) for notation fh >- f^- In fact Ao >- f^. By applying the density 
almost every h G [0, 1]. 

Proof: By hypothesis we can apply Theorem |8] to the 
BP GEXIT function. We get 



IS 

oo 



.g''-(h)dh: 



Further, by the GAT (and applying Fubini theorem and Fatou's 
lemma as in the proof of Theorem |5} 



5(h) dh: 



The proof if completed by noticing that, because of Lemma 
|9l g(h) < g^'ih) for every h e [0, 1]. ■ 
Proving that the hypotheses of this Corollary hold for some 
dd pair (A, p) is a challenging task (see also next section). On 
the other hand, numerical computations show very clearly that 
this is the case, for instance, for cycle ensembles, cf. Example 

IX. Regularity of Extended BP GEXIT Curves 

Theorem |6| ensures (for many LDPC ensembles) the exis- 
tence of a fixed point pair (fx, Cx) for each value of x = H{f^). 
However, for applying the extended Area Theorem |8] the 
resulting family has to be smooth with respect to the parameter 
X. That this is indeed the case is strongly suggested by the 
numerical computation of the EBP curve, cf. Sec. IVIII We 
provide here some partial analytic results in this direction. 

Throughout this section, we denote by *B(a) the Bat- 
tacharyya parameter for the L-density a. Furthermore, when 
assuming communication through the channel BMS(h), we 
denote by the Battacharyya parameter of the channel. 

Lemma 12: Assume communication over the degraded 
family {BMS(h)}h channel using the dd pair (A,/?). Then, 
for any h, there exists at most a unique fixed point density 
such that 



BhA'(l)/(l-»(fh)') < 1. 



(18) 



Furthermore, if such a density fh exists, it coincides with the 
BP fixed point. Finally, 5B(fh) is Lipschitz continuous with 
respect to Bh. More precisely, if the two fixed points fhi, fhj 
satisfy the condition Bh,A'(l)/9"(l - Q3(fhj2) < i _ ^ for 

some 5 > 0, then there exists C = C{S, A, p), such that 

|'B(f,J-S(fhJ| <C|Bh, -BhJ. 
Proof: Consider two channel parameters hi < h2 
and two L-densities ai and a 2 satisfying the condition 
5h,A'(l)/9"(l - »(fhj^) < 1 - 5 for some S > 0. Assume 
that a2 is physically degraded with respect to ai. We prove in 
Appendix IIIII that there exists a constant a — a{X,p,S) < 1 
on 6, the channel family and the degree distribution, such that 



|»(T,,(ai))-'B(T,,(a2))| < 

a|Q5(ai)-»(a2)| + |Bh, 



(19) 



Bh2 



Let us show that this result implies the thesis. Denote by 
fh the BP fixed point for the channel BMS(h) and notice that 
any other fixed point for the same channel is necessarily 
physically upgraded with respect to f^. Using the standard 



evolution operator, we deduce that ajj 
the density after £ iterations of BP. By taking the limit £ 
we get fh ^ fh- 

Next notice that, if fh satisfies Eq. ( I18> there cannot be a 
distinct fixed point, physically upgraded with respect to fh, 
also satisfying Eq. ( I18> . If such a density existed, we could 
apply il9i to get 

I »(rh(fh)) - »(rh(f;))| < a I »(fh) - »(f;)| , 

with a < 1. But, since Th(fh) = fh and rh(fh) = fh, this would 
imply 5B(fh) = S(fh) which is impossible because fh >- f^. 

Let us finally prove Lipschitz continuity, cf. Eq. il9i . 
Under our hypotheses, the two fixed points fh, fh' are the 
BP fixed points for channels BMS(h) and BMS(h'). Consider 
therefore the BP sequences {3^'^}i>o, {3^t'^}i>o- For each 
£, a^'^ (respectively a^^^) is physically degraded with respect 
to fh (respectively fhO, and therefore satisfies the condition 
(I18> . since the latter does. Furthermore, assuming without 
loss of generality h' > h, we have a^' 

|S(ar 
get 



BP.£ 



Let Sf 



1 < a 



Clearly 5q ~ 0. By applying Eq. (I19> . we 

Bhi — Bii^l, and therefore 

a 



I Shi ^ B] 



h2l 



Se < ia + a' + --- + a')\Bi,, - BhJ < ^ 

I — a 

The thesis follows by taking the £ 00 limit. ■ 
It is worth mentioning that the Lipschitz condition Eq. ( I19I I 
implies analogous regularity properties for other functionals of 
the density ah. For instance, it is easy to show that |i7(fhi) — 
-ff (fha)! ^ A I S(fhi) — *B(fh2)|, for some universal constant A. 
Also, the Battacharyya parameter is, for most channel families, 
a smooth function of the channel parameter. Regularity with 
respect to i?h translates therefore immediately into regularity 
with respect to h. 

In applying the above result, it is helpful to have bounds on 
the Battacharyya parameter of the fixed point densities. 

Lemma 13: Assume communication over the channel 
BMS(h) using random codes from the (A, p) ensemble. If f is 
a fixed point density with Battacharyya parameter b ~ 5B(f), 
then 



fe>BhA(fe), 6 " 5]prv/l-(l-&')^-^- 

r 

Proof: First notice that h = BiiX{b') where b = 
X^r ^(f'*'''^"^' )' ^^'^ ^ denotes the convolution at check 
nodes. It is conv enient to write densities in terms of the 
variable u = \j ^ ~ tanh^(a;/2). With a slight abuse of 
notation, we use the same symbol f to denote the density with 
respect to u. We get 

»(f8') = j ^l-{l-ul)---{l-U,y^{ui)dUl ■ ■ ■ f (7i,)du, . 

The proof is completed by using convexity with respect to the 
ui,. . . Ui together with the fact that & = / uf (it) du. ■ 
Example 23: Consider the (2, 3) ensemble and communi- 
cation over the BSC(e). In this case Eq. dlSt is equivalent 
to 



55(f) > Wl- 



1 



2Bie) 



(20) 
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where B{e) denotes the channel Battacharyya parameter as a 
function of the flip probability B{e) = ■\/4e(l — e). Lemma 
I13limplies (if we neglect the case ?B(f) =0 which corresponds 
to a no-error fixed point) 5B(f) > ■\/2 — B{e)~'^. This lower 
bound lies in the region described by equation (I20t as soon 
as B{e) > (VT7- l)/4, i.e., e > e, with 



1 , Vu-i 



32 



which yields w 0.18759473. The above results imply that a 
unique fixed point density (apart from the no-error one) exists 
for any e > e*. On the other hand numerical computations 
suggest this to be the case for all values above the local 
stabiHty threshold ej^ = (2 - V2)/4 « 0.066987298. Fig. [Bl 
shows the Battacharyya constant of the fixed point density as 
a function of the channel parameter of the BSC (solid line), 
the bound stated in ilQi (dotted line), as well as the bound 
05 (f) > - S(e)-2 (dashed Une). 



0.0 0.1 0.2 0.3 0.4 0.5 
Fig. 15. The solid line shows the Battacharyya constant of the fixed point 
density as a function of the chan nel p arameter of the BSC. The dotted line 
corresponds to the bound stated in 1201 . whereas the dashed curve corresponds 
to the bound 'B(f) > - B{t)-'^ . 

As stressed in the previous section, EBP curves are expected 
to be single valued smooth functions of the entropy i?(f) 
of the fixed point density. The same expectation holds, if 
entropy is replaced by any linear functional which preserves 
physical degradation. The following result confirms that better 
regularity properties can indeed be obtained by taking this 
point of view. 

Lemma 14: Let {BMS(h)}ii be a degraded family. Assume 
fi and f2 to be fixed point densities for the channel parameters 
hi, h2, and that fi is physically degraded with respect to f2. 
If 5B(fi) > (5 > 0, then there exists a constant C = C(A, p, 5) 
such that 

-BhJ <C|'B(fi)-»(f2)|. 

In words, the channel is a Lipschitz continuous function of 
the Battacharyya parameter of the fixed point density. 

Proof: Proceeding as in the proof of Eq. \\% . cf. Ap- 
pendix Hill it is easy to show that, if ai >- 32, then 

I ^^(ai)) - »(ri(a2))| < A'(l)/(1)| »(ai) - »(a2)| . 

Furthermore, if <8(a) > (5 > 0, then *B(ri(a)) > 5' for some 
5' > 0. 

Consider now the difference |«B(rhi(fi)) - ^{Ti,^{h))\- 
Since fi/2 are density evolution fixed points, this is equal to 



«B(fi) - «B(f2)|. We get therefore 

I »(fi) - »(f2)| = \B^, »(ri(fi)) - Sh. »(ri(f2))| 
>\B^, - B^j<B{T,{h)) - i?,J «(ri(fi)) - »(ri(f2))l 

>\B^, - B^,\5' - A'(l)p"(l)| »(fi) - »(f2)| , 



which implies the thesis after solving for jShi — B] 



X. MAP Versus BP Marginals 

As we saw in Sections IVIll and IVIIII the MAP and BP 
GEXIT curves are strictly related for LDPC codes in the large 
blocklength limit. We conjectured that they can be connected 
through the Maxwell construction. In particular, this would 
imply that they are asymptotically equal above the MAP 
threshold for a large family of ensembles, cf. for instance 
Example ^] 

Does the coincidence of GEXIT curves mean that BP and 
MAP decoding in fact coincide bit by bitl More precisely, 
belief propagation can be regarded as a low complexity (ap- 
proximate) algorithm for computing the marginal distributions 
PXi \ Y{xi I y)- It is well established [14], that the BP estimate 
is asymptotically correct in the low noise regime h < hBp. We 
wonder whether the same is true whenever the two GEXIT 
functions coincide. 

Perhaps surprising, the answer is positive. In order to pro- 
ceed, it is convenient to introduce some notations. For the sake 
of simplicity we consider the case of a binary channel. Rather 
than the marginal distributions PXi\Y{xi \ y), it is convenient 
to focus on the extrinsic soft bits 

^l,{y) = ^X,\Y^,^y^,]. 

We will further denote by ii^'^iy), the estimate of this quantity 
provided by BP, after £ iterations. Notice that fii{y) ~ 
tanh(/)j(j/^j), and fif'^v) = tanh '^(y^,). 

A meaningful measure of how much 'incorrect' is BP, is 
the mean square error 



1 " I 



Let us stress that A^^^y) implies a rather strict notion of 
correctness. We are not just requiring the hard decision reached 
by BP to be (approximatively) the same that would be provided 
by a MAP decoder. Rather, BP should be able to reconstruct 
the full information about Xi, given the received message. 

Our main result is presented below (here we refer to the 
Tanner graph associated to the code parity check matrix, which 
is naturally related to belief propagation). 

Theorem 9: Consider communication using a linear code 
over a smooth channel BMS(h), and let Y be the chan- 
nel output if the input is uniformly random codeword X. 
Let |c?|( ) denote the GEXIT kernel in the 1 13 1 -domain and 
K = -supj^^^i^ : a;e [0, 1]| > 0. Assume that, for 
a uniformly random variable node i in the Tanner graph, 
the shortest loop through i has length larger than 2£ with 
probability at least 1 — 5. Then 

EAW(r)<-|[g-^^(h)-5(h)]+4<5. 
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Let us stress that this result holds, not just for random elements 
of an LDPC(n, A,p) ensemble, but for any code with the 
prescribed sparseness properties. 

The proof makes use of a technical lemma, which we state 
below, and prove in Appendix IIVI 

Lemma 15: Consider a random variable X taking values in 
{+1,-1} and assume that X ^ Y ^ Z forms a Markov 
chain. Let A; : [0, 1] ^ M be twice differentiable with fc'(O) < 
0, and k"{x) < -K < for any x e [0, 1]. If we denote 
fiY{y) = K[X \Y ^y] and /iz(^) = E[X \ Z ^ z], then 

nk{\MY)\)] < mwizm \ Knt^^iY) ^,ziz)\'] . 

Proof: [Theorem |9l The MAP GEXIT function can be 
written as 

1 " 

g(h) = -^E[H(|M.(F)|)]. 

1=1 

An analogous expression holds for the BP GEXIT function if 
we replace fJ,i{Y) with We claim that, if the shortest 

loop through i in the Tanner graph is longer than 2£, then 

E[\d\MY)\)] <EMi\^C'{Ym- (21) 
-^KE[{^^,{Y)-^,r^^Y)f]. 

The thesis follows by rearranging the terms, using the trivial 
bound {fii{Y) — < 4 whenever the shortest loop 

through i is not longer than 2£ and summing over i. 

In order to prove the above claim, let Y^/ denote the subset 
of received signals within a distance £ from the variable node 
i on the Tanner graph. Notice that X^ Y^i Y^^ is 
a Markov chain, and that ^,(r) = E[X,\Y^,], iif\Y) = 
E[Xi|y^^'']. We can therefore apply Lemma [Ts] with k{x) = 
\d\{x). This yields Eq. M\\ . and thus concludes the proof. ■ 
One may wonder whether the distortion measure A^^^(y) is 
appropriate. One could, for instance consider the actual soft 
bits, rather than the extrinsic ones. If we let jli (y) = V\Xi \ Y = 
y], and denote as i\'^{y) the corresponding BP estimate, we 
may define 

~ 1 " I 2 

i^^'\y) = -Y.W'iy)-f^M ■ 

z— 1 

Recall that hard decoding decisions are taken in terms of 
fli{y), rather than iJ.i{y)- We are therefore interested in know- 
ing whether A^^^(y) can be much larger than A'^^^(y). The 
answer is generically negative, as shown by the lemma below. 

Lemma 16: Assume communication over a BMS channel 
with L-density c(/). Then 

EA(^)(r) < CEA(^)(y) 

where C = ^ e^\^\c{l)dl. 

The proof is deferred to Appendix IIVI 

Theorem |9] obviously imply that belief propagation is 
'asymptotically correct' every time the BP and MAP GEXIT 
functions asymptotically coincide. We conjectured in Section 
I VIII that the MAP GEXIT function can be obtained from the 
EBP one through the Maxwell construction. This construction 



allows therefore to determine in which domain of h BP and 
MAP GEXIT functions do coincide. It is worth stating the 
final result explicitly for a few simple cases cases. 

Corollary 4: Consider communication over degraded, 
smooth and complete family {BMS(h)}ii, using uniformly 
random codes from the ensemble LDPC(77,, \, p) and assume 
that the rate of this ensemble converges to the design rate. 
Assume that the BP fixed point family {BMS(h),ah}, is 
smooth and complete. Then, for almost every h e [0, 1] 

lim lim EA(^)(r) = 0. 
The proof follows easily from Corollary y] 

A somewhat more general statement is the following. 

Corollary 5: Consider communication over over degraded, 
smooth and complete family {BMS(h)}h, using uniformly 
random codes from the ensemble LDPC(rt, X, p) and assume 
that the rate of this ensemble converges to the design rate. 
Assume that the upper bound in on the MAP threshold in 
Theorem Is] is tight: h"**" — h. Then, for almost any h G [h, 1], 

lim lim EA('^)(y) =0. 
Proof: Proceeding as in the proof of Theorem |5l one 
obtain that 

/ .g(h)dh= / .g'""(h)dh = r. 

A Jh 

Since g{'h.) < .9'"'(h) for all h, we have necessarily g{h) = 
g^^{h) for almost any h € [h, 1]. The thesis follows by applying 
Theorem |5] ■ 

XI. Why We Can Not Surpass Capacity: The 
Matching Condition 

The upper bound li on the MAP threshold, cf. Theorem 
|5] cannot be larger than the Shannon threshold 1 — r. This 
follows by noticing that the GEXIT kernel is not larger than 
1, and implies that iterative coding systems do not allow to 
communicate reliably above capacity. Of course, this result 
is also a straightforward consequence of Shannon's channel 
coding theorem. In this section we shall provide yet another 
proof of this basic fact. The interest of the new proof is three- 
fold: (i) it does not assume communication over a smooth 
channel family; (ii) it uses only quantities appearing in density 
evolution (and not just fixed points); (Hi) component codes 
(and their 'matching') play a crucial role. 

For general BMS channels, and motivated by the geometric 
statement observed for the BEC and the relationship between 
the derivative of the mutual information and the MSE intro- 
duced by [7], [26], a similar chart, called MSE chart was 
constructed by Bhattad and Narayanan [35]. Assuming that the 
input densities to the component codes are Gaussian, this chart 
again fulfills the Area Theorem. In order to apply the MSE 
chart in the context of iterative coding the authors proposed 
to approximate the intermediate densities which appear in 
density evolution by "equivalent" Gaussian densities. This was 
an important first step in generalizing the matching condition 
to the whole class of BMS channels. In the following we 
show how to overcome the need for making the Gaussian 
approximation by using GEXIT functions. 
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To start, let us review the case of transmission over the 
BEC(h) using a degree distribution pair (A,p). In this case 
density evolution is equivalent to the EXIT chart approach 
and the condition for successful decoding under BP reads 

c{x) = 1 - p(l - x) < X^\x/h) = v^[\x). 

This is shown in Fig. ^] for the degree distribution pair 
{\{x) ~ x^, p{x) ~ a;**). The area under the curve c{x) equals 



h 


= 0.58 






r/ c(fc-)- 























1 - a,. 



h < 



l-r(A,p). 



^out-variable ^in-check 
Fig. 16. The EXIT chart method for the degree distribution (A(x) = 
x^, p(x) = X*) and transmission over the BEC(h = 0.58). 

1 — Jp and the area to the left of the curve Vy[^{x) is equal 
to h J A. By the previous remarks, a necessary condition for 
successful BP decoding is that these two areas do not overlap. 
Since the total area equals 1 we get the necessary condition 
hjA+1 — J p < 1. Rearranging terms, this is equivalent to 
the condition 

/A 

In words, the design rate r(A, p) of any LDPC ensemble 
which, for increasing block lengths, allows successful de- 
coding over the BEC(h), can not surpass the Shannon limit 
1 — tt. An argument very similar to the above was introduced 
by Shokrollahi and Oswald [36], [37] (albeit not using the 
language and geometric interpretation of EXIT functions and 
applying a slightly different range of integration). It was the 
first bound on the performance of iterative systems in which 
the Shannon capacity appeared explicitly using only quantities 
of density evolution. A substantially more general version of 
this bound can be found in [16], [38], [39]. The extension 
to parallel turbo schemes is addressed in [40], [41]. See also 
[42]. 

Although the final result (namely that transmission above 
capacity is not possible) is trivial, the method of proof is well 
worth the effort since it shows how capacity enters in the 
calculation of the performance of iterative coding systems. By 
turning this bound around, we can find conditions under which 
iterative systems achieve capacity: In particular it shows that 
the two component-wise EXIT curves have to be matched per- 
fectly. Indeed, all currently known capacity achieving degree- 
distributions for the BEC can be derived by starting with this 
perfect matching condition and working backwards. Let us 
now show that, by using component-wise GEXIT functions, 
the perfect matching condition holds in the general case. This 
might in the future serve as a starting point to find capacity- 
achieving degree distributions for general BMS channels. We 
need one preliminary definition. 



Definition 9 (Interpolating Channel Families): Consider a 
degree distribution pair (A, p) and transmission over the BMS 
channel characterized by its L-density c. Let a_i = Ag and 

ao = c and set a^, a e [—1, 0], to a^ = — aa_i + (1 + a)ao. 
The interpolating density evolution families {sia}'S'=-i and 



{ba} = 



are then defined as follows: 

1 H(j-l) 



a > 0, 



= A,c*b: 



a > 0, 



where * denotes the standard convolution of densities and a ID b 
denotes the density at the output of a check node, assuming 
that the input densities are a and b, respectively. 
Discussion: First note that a^ (b^), £ S N, represents the 
sequence of L-densities of density evolution emitted by the 
variable (check) nodes in the £-th iteration. By starting density 
evolution not only with ao = c but with all possible convex 
combinations of Aq and c, this discrete sequence of densities 
is completed to form a continuous family of densities ordered 
by physical degradation. The fact that the densities are ordered 
by physical degradation can be seen as follows: note that 
the computation tree for a a can be constructed by taking 
the standard computation tree of a|-Q^ and independently 
erasing the observation associated to each variable leaf node 
with probability \a\ — a. It follows that we can convert 
the computation tree of a a to that of a^-i by erasing all 
observations at the leaf nodes and by independently erasing 
each observation in the second (from the bottom) row of 
variable nodes with probability \a\ — a. The same statement 
is true for bo,. If lim^^oo H{3i) = 0, i.e., if BP decoding is 
successful in the limit of large blocklengths, then the families 
are both complete. 

Example 24 (Density Evolution and Interpolation): 
Consider transmission over the BSC(e = 0.07) using 
a (3, 6)-regular ensemble. Fig. ^] depicts the density 
evolution process for this case. This process gives rise to 



l>2 





ff(a) H(a) 

Fig. 17. Density evolution for (3, 6)-regular ensemble over BSC(0.07). 

the sequences of densities {a^j^p, and {bf}^-^. Fig. ^] 
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shows the interpolation of these sequences for the choices 
a = 1.0,0.95,0.9 and 0.8 and the complete such family. 



Q = 1.0 a = 0.95 g = 0.9 




H(a) Jf(a) 



Fig. 18. Interpolation of densities. 

Lemma 17: Consider a degree distribution pair (A,/?) and 
transmission over an BMS channel characterized by its L- 
density c so that density evolution converges to Aoo. Let 
{aaj^L-i ^iid {balJJLo denote the interpolated families as 
defined in Definition |9l 

Then the two GEXIT curves parameterized by 

{if(aa), G(Zon bct+i)}, GEXIT of check nodes 

{iJ(a„),G(a„,b„)}, 

inverse of dual GEXIT of variable nodes 

do not cross and faithfully represent density evolution. Further, 
the area under the "check-node" GEXIT function is equal to 
1 — J p and the area to the left of the "inverse dual variable 
node" GEXIT function is equal to H{c) /A. It follows that 
r{X,p) < 1 — -ff(c), i.e., the design rate can not exceed the 
Shannon limit. 

Proof: First note that {H{aa),G{aa,^a+i)} is the 
standard GEXIT curve representing the action of the check 
nodes; corresponds to the density of the messages en- 
tering the check nodes and b^+i represents the density 
of the corresponding output messages. On the other hand, 
{H{aa),G{aa,ba)} is the inverse of the dual GEXIT curve 
corresponding to the action at the variable nodes: now the 
input density to the check nodes is and Bq. denotes the 
corresponding output density. 

The fact that the two curves do not cross can be seen as 
follows. Fix an entropy value. This entropy value corresponds 
to a density aa for a unique value of a. The fact that 
G(aQ,,bQ,) > G{aa,ba+i) now follows from the fact that 
ba+i -< ba and that for any symmetric a^ this relationship 
stays preserved by applying the GEXIT functional according 
to Corollary n 

The statements regarding the areas of the two curves follow 
in a straightforward manner from the GAT and Lemma |8l The 
bound on the achievable rate follows in the same manner as 
for the EEC: the total area of the GEXIT box equals one and 
the two curves do not overlap and have areas l — Jp and H{c). 
It follows that 1 — j p + H{c) JX < 1, which is equivalent to 
the claim r(A,/9) < 1 - i/(c). ■ 



We see that the matching condition still holds for general 
channels. There are a few important differences between the 
general case and the simple case of transmission over the 
EEC. For the EEC, the intermediate densities are always the 
EEC densities independent of the degree distribution. This 
of course enormously simplifies the task. Further, for the 
EEC, given the two EXIT curves, the progress of density 
evolution is simply given by a staircase function bounded 
by the two EXIT curves. For the general case, this staircase 
function still has vertical pieces but the "horizontal" pieces 
have in general a non-vanishing slope. This is true since the 
y-axis for the "check node" step measures G{aa, ba+i), but 
in the subsequent "inverse variable node" step it measures 
G{aa+i,ba+i)- Therefore, one should think of two sets of 
labels on the y-axis, one measuring G{aa,ba+i), and the 
second one measuring G{aa+i, ba+i). The "horizontal" step 
then consists of first switching from the first y-axis to the 
second, so that the labels correspond to the same density b 
and then drawing a horizontal line until it crosses the "inverse 
variable node" GEXIT curve. The "vertical" step stays as 
before, i.e., it really corresponds to drawing a vertical line. 
All this is certainly best clarified by a simple example. 

Example 25 ( (3, 6) Ensemble and Transmission over BSC): 
Consider the (3, 6)-regular ensemble and transmission over 
the ESC (0.07). The corresponding illustrations are shown 
in Fig. The top-left figure shows the standard GEXIT 
curve for the check node side. The top-right figure shows 
the dual GEXIT curve corresponding to the variable node 
side. In order to use these two curves in the same figure, it 
is convenient to consider the inverse function for the variable 
node side. This is shown in the bottom-left figure. In the 
bottom-right figure both curves are shown together with the 
"staircase" like function which represents density evolution. 
As we see, the two curves to not overlap and have both the 
correct areas. 

As remarked earlier, one potential use of the matching 
condition is to find capacity approaching degree distribu- 
tion pairs. Let us quickly outline a further such potential 
application. Assuming that we have found a sequence of 
capacity-achieving degree distributions, how does the number 
of required iterations scale as we approach capacity. It has been 
conjectured that the the number of required iterations scales 
like 1/6, where S is the gap to capacity. This conjecture is 
based on the geometric picture which the matching condition 
implies. To make things simple, imagine the two GEXIT 
curves as two parallel lines, lets say both at a 45 degree 
angle, a certain distance apart, and think of density evolution 
as a staircase function. From the previous results, the area 
between the lines is proportional to 6. Therefore, if we half 
5 the distance between the lines has to be halved and one 
would expect that we need twice as many steps. Obviously, 
the above discussion was based on a number of simplifying 
assumptions. It remains to be seen if this conjecture can be 
proven rigorously. 

XII. Conclusion 

Since the introduction of EXIT functions for the analysis 
iterative coding systems [17]-[21], researchers have tried to 
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Fig. 19. Faithful representation of density evolution by two non-overlapping 
component-wise GEXIT functions which represent the "actions" of the check 
nodes and variable nodes, respectively. The area between the two curves is 
proportional to the additive gap to capacity. 



substantiate theoretically the empirical area rules that these 
seemed to satisfy. In this paper we showed how to prove 
these rules in a very general setting. The price to pay was 
to replace EXIT functions by GEXIT functions. Fortunately, 
GEXIT functions are as simple to compute as ordinary EXIT 
functions and share in general many of their properties. 

We also presented several applications of this new tool. 
Most notably: (i) It allows one to prove an upper bound on 
the MAP threshold which is conjectured to coincide with the 
actual threshold for several classes of ensembles (e.g. regular 
ones), (m) Via extended BP GEXIT curves, it provides some 
constraints on the relation between BP and MAP decoding. 
These constraints lead naturally to the Maxwell construction 
which provides the precise connection between the two. In 
particular we found that the BP soft bit estimates are asymptot- 
ically exact for a noise range above threshold. (Hi) It implies a 
matching constraint on component codes of capacity-achieving 
systems. 

These results open many research directions. It may be 
worth to list a few of them. 

Prove existence, uniqueness and regularity properties of 
asymptotic MAP and extended BP GEXIT curves. In partic- 
ular, we expect that the last one is a smooth single valued 
function of the entropy of the fixed point density. The iterative 
procedure which we presented in Section IVIIII only proves 
that for each message entropy there is at least one fixed 
point of density evolution. But, empirically, when running this 
algorithm, we found that indeed there seems to be a unique 
such fixed point and that all these fixed points seem to form a 
smooth manifold. Further, we proved several partial results in 
this direction (for instance existence for EBP curves, unique- 
ness for MAP curves, etc). However, the general question 
remains open. 



Prove that the Maxwell construction indeed provides the 
correct connection between MAP and BP GEXIT curves. As 
particular case (which may well be simpler than the general 
statement), prove the upper bound Q is indeed tight for some 
selected ensembles, e.g. for regular ones. 

Use the interpolation construction of Section IXll to prove a 
lower bound on the number of message passing iterations as 
a function of the gap to capacity. 
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Appendix I 
GEXIT Kernel over Gaussian Channels 

This appendix contains a few useful results concerning the 
GEXIT kernel for Gaussian channels. 

Lemma 18 (GEXIT Kernel, L-Domain - {BAWGNC{h)}}: 
Consider the family {cBAWGNc(h)} of BAWGN channels, where 
h denotes the channel entropy. The channel model is therefore 
y = X + TV, where X takes values x e X = {-1, +1} and 
N is Gaussian with zero mean and variance . Then the 
following represent equivalent kernels: 



^CbaWGNC(Ii) 

^'cBAWGNC(h) 
^ CBAWGNC(h) 



/- 



+00 — J 

OO (cOBh(i 



1 -E[E[x|y,$ = , 
i-E[E[x|y]2] 

1 -E[x|y,$ = z] 
1 -E[x|r] ■ 



(i) 

(ii) 
(iii) 



Hereby, $ denotes a further observation of X which is 
conditionally independent of Y, which is the result of passing 
X through a symmetric channel, and which is assumed to 
be in log-likelihood form (if we use coding, $ represents the 
extrinsic estimate of X in the L-domain). 

Discussion: This lemma provides several equivalent rep- 
resentations of the kernel for the BAWGN channel. The 
expression (ii) shows the relationship between conditional 
entropy and mean-square error (MSE) estimator. To see this, 
observe first that the denominator is a (z independent) scal- 
ing factor depending on our parameterization of the channel 
through its entropy h. Second, observe that the numerator 1 — 
E[E[X|r,$ = zf] = E[E[X2|y,$ = z] -E[x|r,$ = z]^] 
is the mean-square error estimator (which in this framework 
includes the decoding estimate z). This elegant relationship 
which connects a fundamental information theoretic quantity 
(the conditional entropy, or, equivalently, the mutual informa- 
tion) to a measure widely-used in signal processing was first 
observed by Guo, Shamai and Verdu in [7], [26]. In the above 
lemma, the channel inputs are binary. In Lemma|20]we give an 
alternative way of deriving ?=BAWGNC(h) j^^;) in the more general 
context of non-binary channel inputs. 
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The form (iii) provides a further simplification. This ex- 
pression, in which the numerator shows the magnetization was 
first stated in [8] using the Nishimori identity (in the context 
of coding, this identity was first discussed in [28]). 

Before proving Lemma let us recall the following well- 
known fact which will be used several times in the following: 
Consider a BMS channel PY\x{y\x) ^nd f{y), a measurable 
function. If f{y) is even, then Ey[/(y)] = Ey|x=i [/(i^)]. 
Proof: Under the all-one assumption, the channel density is 



(i) The kernel as stated in Lemma|3lis expressed in terms of the 
derivative of c{w) with respect to the channel parameter. To get 
a more pleasing analytic expression we use the fact that for the 
Gaussian case we can express this derivative via the identity 

^^7^ = "^^^ST' ^" ^ dw^^ • '^o^' '^he parameterization e = 
2/(7^. Then using twice integration by parts (as in [8]), we get 

de /■+°° dciw) 
;cBAw™c(.) (z)—^ / -4-!- log(l + e-"'~-')dii; 
dh ,/_^ de 



— OO 



dc{w) e 



dw 1 + e 



-dw 



+00 



c{w) 



+00 



c{w)- 



1 + e-™-^ 
-1 



-dw 



— e 



OO 

— 2 r+oo 



1 + e.w+z^2 
c{-w) 



dw 



(C0sh(^))2 



-dw. 



The computation of ^ is exactly the same if we set z ~ 0. 
Therefore, 



^CBAWGNC(h) 



J-00 (cosh(ii^))2 

r+oo e 4; 
J —oc (cosh( ^ ))'^ 



(ii) First, we claim that the previous expression can be 
written as 



^^BAWGNCth) = e 



_^l-E[E[X\Y,^^-z]^] 
l-E[E[X|r]2] 



To see this, observe that 

w + z = log ^ — + log — 

p^lxM-^) P<i>\x[z\-^) 

(6) P2y. + 1) 
= ^Og— 7 1 TT 

(c) Px|2v..$( + l|w,2) 
^ log T' 

where (a) comes from the definition of w and z in LemmafTsl 
(6) from the independence of Y and $ when X is given, and 
where (c) is the Bayes rule using px{+l) = px{—^) = \- 



Therefore, 

tanh( — ; 



1 



1 + e- 



-,{+l\w,z)-px\^.^^{-l\w,z) 



Px\2^A+'^\^^^) 

¥.[X\w,z]. 



,{-l\w,z) 



This quantity (which is often called "soft bit" as in [43]) 
is a bit estimate in the D-domain and the relationship 
E[X|'u;,z] = tanh(i^^) is in fact well-known. Therefore, 
(tanh(i^))2 _ 1 



since 1 



(cosh(i^))2' 

l-/_^^cH(tanh(H±£))2d^ 



1 



1-r 



00 
00 



c(w)(tanh(f ))2dw 



Ev|x=i[(tanh(^))2] 



1 -Ev 



J(tanh(^))2 



and the claim follows since, as discussed above, we can drop 
in the last expression the conditioning on X = 1. 

Second, as discussed in Example |3 the kernel is in 
general not unique in the L-domain and we can use 
this degree of freedom to get alternative kernels. Denote 
/(^) = \^E[E[x^\Y^\^r ™d observe that I'^^Av^cncw (z) = 
cxp(— z)/(z) with this notation. Then, for any symmetric 
density a(z), the function ^ ^BAWGNcft) (2) — f{~z) is also a 
valid kernel for the i-domain since JJ^^ a(z)e~^/(z)dz = 
a(z)/(— z)dz. Therefore, an alternative kernel is 



^ CBAWGNC(h) = 



l-E[E[X|y,z]2] 

1 -E[E[x|y]2] 



+00 



X 



+00 



-dw 



(iii) For any symmetric random variable L, a straightforward 
exercise shows that E[tanh(L/2)] = E[(tanli(L/2))2]. See, 
e.g, [8], [28]. Applied to the symmetric random variable 
L = log = -^Y under the all-one assumption, 

this gives us E[E[X|r]2] = E[tanh(f)2] = E[tanh(f )] ^ 
K[X\Y]. Therefore the denominator can be easily written as 



i-E[E[x|F]^] = i-E[x\Y] ■ directly this argu- 

ment for the term E[E[X|y,z]2] = E[tanh(|7 + f)^] at the 
numerator (the random variable -^Y+z being not symmetric). 
However, we can look for an equivalent kernel. This is easily 
done by observing that the values z are provided by the 
symmetric random variable <i>. The sum of two symmetric 
random variables is again symmetric, therefore + $ is 
symmetric. See, e.g., [11]. We can then use the fact that 
E[tanh(L/2)] = E[(tanh(L/2))2] with L = ^Y + $ to 
write Ey.1. [Ex [X\Y, $]2] = Ey,^ [Ex [X[Y, $]]. Therefore, 



^ CBAWGNC(h) (^2^ 



l-E[X[Y,z] ^ I 
l-E[X[Y] 



+ 00 
00 



+ 00 
OO 



-dw 



is an equivalent kernel (but pointwise different from 

/CBAWGNc(h) and /'^BAWGNc(h)(2;)). The last equality comes from 
the fact that 1 - E[X\Y,z] = 1 - Ey[tanh(^)] - 
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GEXIT and EXIT curves are in general very similar. Next 
lemma illuminates this fact: it shows that, in the limit of small 
SNR, the kernel for the BAWGNC behaves similarly to the 
kernel for the BSC discussed in Example |6l 

Lemma 19 (Limiting Behavior of GEXIT Kernel): 
Consider the family {cBAWGNc(h)} of BAWGN channels, 
where h denotes the channel entropy: The additive noise 
in the model Y = X + N \s Gaussian with zero-mean and 
variance . Then 



lim |fi|=»™GNC(h)(5) = 1 - 
lim |d|^»™™'^<'')(s) = 1. 

(T— i-0 



(i) 
(ii) 



In the I D \ -domain, the kernels are ordered between those two 
extremal functions. 

Proof: First recall the transform formula (|9} and 
(i) With expression (iii) of Lemma [Tsl 

c(;) tanh(V2+tanh"^(s))di 



2tanh"'(s) = logi^ 



we have /'^(2tanh (s)) 



c(l) tanh(i/2)di 



Let us restrict ourself to the study of the term Ia{s) ~ 
/+^c(/)tanh(Z/2 + tanh"^(s))dL When oo, then 

the distribution of the channel inputs (more exactly of the 
LLR's in the L-domain) c(/) = cxp(- '"'^'^g^'"'^' ) 

becomes a Dirac centered in (since its variance 4/(7^ 
0). For any function continuous in 0, e.g., for the function 
ks : I ^ tanh(//2 + tanh~^(s)), one can indeed replace, 
without committing much error when cr^ oo, the integral 
c{l)ksll)Al by /+^c(Ofc.(0)dL See, e.g., [44] for fur- 
ther details. Therefore 



Ia{z) — > tanh(0/2 + tanh~^(s)) = s. 

(T — ^OO 

Using (|9}, we finally get 

l,lr/\ 1-Sl + ,S 1 + sl — S ^ n 

Wis) ^ \ = 1 - S. 

' ' ^ 2 12 1 

(ii) The case ct ^ corresponds to the full knowledge of the 
channel input. The kernel in the jZ?! -domain converges point- 
wise to 1. As used for density evolution, see [11], in this case 
c(Z) becomes a Dirac at oo a a similar argument as for (i) can 
be applied. 

For e e (0, 1), the kernels in the iDj— domain are ordered 
because of Lemma ■ 

As discussed before Lemma the pleasing relationship 
presented in [7], [45] or [8] emerges for the BAWGNC. So 
far we have restricted ourself to the case of binary inputs. But 
the non-binary case as discussed in [7], [9], [45] is not much 
harder This is presented in Lemma |20l using our framework. 

Lemma 20 (AWGNCh.)): Consider a length n code, call 
it G. Assume transmission takes place over a family 
{AWGNG{)\i)}i(z[n] where there is a global parameter e such 
that lii(e) = h(e) is the entropy associated to the channel 
for aU i € [n]. Let this parameter be e = — 2snr = — Then 

g,(G,e)=E[E[Xf|r]-E[X,|r]2]. 

In words, the derivative of the conditional entropy with respect 
to the particular paramater e is equal to the Mean-Square Error 
(MSE) estimator. 



Proof: We will prove the result in general settings when 
the input alphabet X can be any subset of R. Temporarily, let 
us denote Y ^ X + N our running Gaussian channel model. 
N is the additive white Gaussian noise with zero-mean and 
variance a^. Now let us normalize this model by to get 
the equivalent model Y = ^JsmX + N where snr = 4j- and 
N is an additive white Gaussian noise with zero-mean and 
unit- variance. In order to be a sufficient statistics, the extrinsic 
MAP estimate 0^ = can no longer be a log-likelihood 

ratio but, in general, a function of Xi, i.e., (p^ : x ^ (p^d/r^i, x). 
From 0, it follows that 



5z(G,e) 




p{xi)p{<l).i I x{)—p{y,\x^)- 
de 



P{x'^\(|)^)p{V^\x'i) 

p{xi\(Pi)p{yi\^i) 



dx- AxiAyi- 



To simplify the computations, a few remarks are of order 
First recall that we have chosen e to be e = — 2snr = 
Second, observe that the Gaussian density permits us to write 
''^^d'j'"'^ = ^d^P(2/i|a^0- Therefore, integrating by parts 
with respect to yi, we get 



.9.(G,e) 



p{xi)p{4>^ I Xi) 





snr 

P{x[\(|)i)p{v^\x[) 
P{Xi\4'^)p{jJt\x^i) 



P{xi)p{4>i I Xi)^=p{yi\x^)- 
Vsnr 

L[ Vsnr(a;;; - x,)p{x',\(l),)p{y^\x',)Ax'^ 

!x[ Pi^'^\4'^)Piy^Wi)^A 



dxidj/id^j, 



after having used — 
^/smx[)p{yi\x'^). Let us now re-order as p{x'j\4ii)p{yi\x^) = 
p{x'^\4>i, yi)p{yi\4>i) and use (with a slight abuse of notations) 
Ex. toget 



P{xi)p{(t)i \xi)xip{yi\xi)- 
—axiayidcp^ 



4'i-Vi 



p{yt\4>i) 
p{<Pi,yi) 

p{xi\yi,(l)i) (^xf 

p{<Pi,yi) 



(yi + (j)i)xi 



/snr 



dxidi/id^j 



't>i,Vi 



■ (Ex. - Ex. [X,|0„2/,]')dy,d,^,. 

This concludes our proof since $i is a sufficient statistic for 
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Appendix II 
Physical Degradation: a Calculus Proof 

In this appendix we provide a direct calculus proof of 
Corollary n exploiting the explicit representation provided by 
Lemma |3l As a byproduct we show that the GEXIT kernel 
in the jZ?! -domain is non-increasing and concave. This fact is 
also used in the proof of Lemma |6l 

For our purpose it is convenient to represent all quantities 
in the |Z3|-domain. Let {|Cb„s(ii) |}h denote the family of \D\- 
densities characterizing the channel family. Let |c?|™^'^'')(?ii) 
denote the GEXIT kernel in the jZ?! -domain as introduced in 
(|9}. We can rewrite it in the form 

= ; a{z,w)dz, 



Jo dh 
where 

Oi{z,w)^^ ^ {l+iz){l+jw)P{iz,jw), 

with l3{z,w) = log2(l + e-2*^"ii"'(--)e-2ta"h-i(«,)^^ Finally, 
let |a| and |b| denote the two symmetric densities in the \D\- 
domain. 

The claim of the theorem is then equivalent to the statement 
that the GEXIT functional |d|™'('')(w)|o|(u;)du; preserves 
the partial order implied by physical degradation. This means 
that if \a\ -< \b\ then 

\dr'^^\w)\a\{w)dw < r \dr<^\w)\b\{w)dw. 
'0 Jo 

By Theorem 3.4 in [11], a |D|-domain kernel preserves the 
partial order implied by physical degradation if it is non- 
increasing and concave on [0, 1], i.e., if its first two derivatives 
are non-positive. This means we need to show that 

d\c^ Ms(h)\{z) d'a{z,w) ^_^ ^ ^ 
iQ dh. dw^ ~ 

for i — 1,2. By the same Theorem 3.4 the above condition 
is verified if both ^ a^^,'"'^ for i = 1,2, are convex and non- 

d^'^-^ a{z.w) 

Now some further calculus shows that 



decreasing. This in turn is true if ■ 
ther calcu 

da{z, w) 



> for i,j = 1,2. 



dw 



iz log2(l + iwz)- 



ilog(l + iw), 



(22) 



ln(2) 



d^a{z, w) 



1 



1 



Note that equation ( l23i implies that " 'gl^i^' has a positive 
expansion in z (except for the constant term). Therefore the 

= 1,2, are both positive and by 



dw'-^dz^ 



derivatives 

symmetry of the function a{z,w) in its arguments z and w 
so is ^g'^^g^-f ■ Finally, 
d'^a{z, w) 



log(2)- 



dwdz 



2 1 - 
i>a 



wz 



1 



ii + l)iw'z') 
2i+l 



which has a positive Taylor series expansion as well. This 
confirms our claim that the GEXIT kernel preserves the partial 
order implied by physical degradation. 

Appendix III 
Proof of Eq. ( fT9t 

In this appendix we prove the claim il9\ . First notice that 
^{T^{a)) = A(<8(p(a))). Since < X{x) < 1 and A'(.t) < 
A'(l), we have 

|Q3(Th,(ai))-<B(Th,(a2))| < (24) 

A'(l)i3h, I »(p(ai)) - »(p(a2))| + \B^, - ShJ 

In order to estimate |58(p(ai)) — 5B(p(a2))|, define, for t e 
[0, 1], at = (1 — t)ai + ta2, and write 

dS(p(at)) 



S(p(ai))-»(p(a2))| < 



dt 



dt. 



(25) 



The derivative of the Battacharyya parameter is easily com- 
puted (to lighten the notation we omit hereafter the argu- 
ment of S(-) in the derivative). The result is most conve- 
niently expressed in terms of densities of the variable u = 
y^l — tanh^(a;/2), where x is the log-likelihood ratio (this 
quantity is equivalent to the |_D| -variable and its expectation 
is Battacharyya parameter). If we denote the corresponding 
densities by the same symbols, we get 



d«8 
"dT 



= p'(i) / 



9 ? 



(26) 



• (a2(ui) - ai(wi)) h{u2) dui du2 , 
where we introduced the density 



*(r-2) 



Using integration by parts with respect to ui in Eq. (I26> and 
denoting by Ai , A2 the distributions corresponding to densities 

ai, a2, we get 

ui(l - ul) 



i - "l"2 

■ (A2(wi) - Ai(ui)) b(M2) dui du2 , 

Since 82 is physically degraded with respect to ai, A2{u) > 
Ai{u) for any u £ [0,1]. Furthermore J Ai{v)dv = !B(ai). 
Therefore 



(23) 

where 



^=p'(l)[»(a2)-»(ai)]S, 



(27) 



U\U2 



f{ui) b(u2)d'Uid'U2 



and / is a function on [0, 1] non negative and with unit integral. 
In other words, / is a probability density function. Since 

\/u1 + ul — u\u\ > Ul, we obtain the bound 



S < 



(1 - w^) b(u)du 



, r-2 
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where we used the definition of b. If we further notice that 
/o u at{u) = *B(at) > »(ai), we get 



The claim follows by putting together Eqs. i25\ . i2H . and 

(EH). 

Appendix IV 
MAP Versus BP Marginals; Some Technical 
Details 

In this appendix we present the proofs which were omitted 
in Sec. El 

Proof: [Lemma 1151 Let us make a few preliminary 
remarks. The first one follows immediately from the definition: 



E{tiy{Y)\Z = z}^fiz{z). 



(29) 



In fact, using the Markov property, the left hand side can be 
written as E{E[X \Y]\Z = z}= E{E[X \Y,Z]\Z ^ z} that 
is equal to E[X \ Z = z] = fJ,z{z)- 

The second remark is that, by elementary calculus, for any 

< .To < .T < 1 

k{x) <k{xo)-^K{x^ -xl). 

Finally, for any random variable W, taking values in [0, 1], 
we have (here Vai{W) is the variance of W): 

E k{W) < k{E W)-^ K Y&t{W) . 

In fact, by Taylor expansion k{W) < k{wo) + k'{wo){W — 
Wo) — ^ K {W — wq), for any wq G [0, 1]. The claim is proved 
by taking expectation of both sides and setting wq = EW. 

These ingredients are put together as follows (here we use 
the shorthands /iy and /iz for, respectively, fJ.Y{Y) and fj,z{Z)) 



E[k{\fly\)]^E{E[k{\tly\)\Z]} 

<E^k{E[\^ly\\Z])-lKVaTi\^,y\ \ Z) 



<E fc(|E[^Y|^]|) 



2 



l[\^,y\\Zf-E[^iy\Zf)- 



--XVardA^Yl \ Z) 



= E fc(|E[/.Y|Z]|) 



-KE[{^iy -E[^Ly\Z]f\Z] 



E {k{\^iz\) - -KE[{^Jiy - tizf\Z] 



E[k{\iiz\ 



■KE[{iiy-nzf 



which completes the proof. ■ 
Proof: [Lemma [T6l We claim (and will prove later) that 



^,r■\Y)-^^,{Y) 



where l{yi) is the log-likelihood associated to the channel 
output Ui. If we square and take expectation with respect to 



Y (recalling that nf' O^), Mi(^) do not depend upon Yi), we 
get 



(28) E 



Mr(r)-/i,(y) 



< CE 



The thesis follows by summing over i. 

We are left with the task of proving the first claim above. 
We recall that the conditional expectations can be represented 
in terms of extrinsic log-likelihoods as 



^J'iiy) 



tanh 
tanh 



il{y^) + Myr.^)) 



Analogous formulae hold if we replace fii{y) (respectively 
/ii(y)) with nY'\y) (respectively Aif'^(y)) and with 
4'T'^iy^i)- The claim follows immediately form the following 
calculus exercise below. ■ 
Fact 1: For any xi, X2, z G K 



I tanh(a;i + z) — tanh(a;2 + z)\ < 



tanh(a;i) — tanh(a::2)| 



Proof: Consider, without loss of generality, xi > X2 and 
z < 0. It is simple to realize that, for any a; e R 

1 + tanh(x + z) < 1 + tanh(a::) , 
l-tanh(a:: + z) < e"^^(l - tanh(a;)) . 

The last statement follows by writing 1 + 6^^^+^^ > e^^(l + 
e^^) and taking the inverse of both sides. 

The thesis is proved by multiplying these inequalities, and 
integrating over x E [xi,X2]- ■ 
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